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Abstract 



Improved procedures, in terms of smaller missed discovery rates (MDR), for performing mul- 
tiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) 
or the false discovery rate (FDR) are developed and studied. The improvement over exist- 
ing procedures such as the Sidak procedure for FWER control and the Benjamini-Hochberg 
(BH) procedure for FDR control is achieved by exploiting possible differences in the pow- 
ers of the individual tests. Results signal the need to take into account the powers of the 
individual tests and to have multiple hypotheses decision functions which are not limited to 
simply using the individual p-values, as is the case for example with the Sidak, Bonferroni, 
or BH procedures. They also enhance understanding of the role of the powers of individual 
tests, or more precisely the receiver operating characteristic (ROC) functions of decision pro- 
cesses, in the search for better multiple hypotheses testing procedures. A decision-theoretic 
framework is utilized, and through auxiliary randomizers the procedures could be used with 
discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to 
existing p- value based procedures whose theoretical validity is contingent on each of these p- 
value statistics being stochastically equal or greater than a standard uniform variable under 
the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional 
"large M, small n" data sets arising in the natural, physical, medical, economic, and so- 
cial sciences, whose generation and creation is accelerated by advances in high-throughput 
technology, notably, but not limited to, microarray technology. 



Keywords and Phrases: Benjamini-Hochberg procedure; Bonferroni procedure; decision 
process; false discovery rate (FDR); family wise error rate (FWER) Lagrangian optimiza- 
tion; Neyman-Pearson most powerful test; microarray analysis; reverse martingale; missed 
discovery rate (MDR); multiple decision function and process; multiple hypotheses testing; 
optional sampling theorem; power function; randomized p-values; generalized multiple deci- 
sion p-values; ROC function; Sidak procedure. 

1 Introduction and Motivation 

Advances in modern technology, spearheaded by microarray technology, have led to the 
creation or generation of many data sets characterized by a large number, M, of sets of 
variables, with the mth set composed of variables which pertain to characteristics of the 
mth attribute of an observational unit. For historical reasons an attribute will be referred to 
as a 'gene'. The variables in Sm are only measured or observed for a small number of units. 
Such variables may come in varied types such as being continuous, categorical, discrete, 
mixed, or even as functional data. They may also possess an inherent data structure such as 
being a multi-group data, a regression-type data, or as an event-time data with covariates 
and with right- censoring or truncation. Such data sets will symbolically be represented by 
the collection of random elements 

DATA = {Z^j : j = 1, 2, . . . , n; m = 1, 2, . . . , M} 

with n denoting the number of units observed, i.e., the number of replications. It is typical to 
have n « M. To simplify notation and introduce conciseness and generality, the observables 
for gene m will be denoted by = {Z^j : j = 1, 2, . . . , rim}. 

Efron [12], for example, described four such data sets. The first is a prostate data set from 
[H] with M = 6033 genes and for each gene is associated a variable Zmi indicating presence 
(value = 1) or absence (value = 0) of prostate cancer and a variable Zm2 representing a 
continuous response. For the mth gene the random vector Z^ = {Zmi, Zm'i) was observed on 
n = 102 replications, and these 102 subjects were utilized to compare the diseased {Zmi = 1) 
and the non-diseased {Z^i = 0) groups with respect to the response variable using a 
two-sample t-test. The other three data sets described in [12] were an education data set 
from [33] with M = 3748; a proteomics data set from [51] with M = 230 and n = 551; 
and an imaging data from [30] with M = 15445 and n = 12. In all of these data sets, a 
decision is to be made for each gene, with the decision being a choice between two competing 
hypotheses, obtaining an estimate of some parameter of interest, or predicting the value of 
some function of Z^o, the observable for a new unit. 

In essence, these "large M, small n" data sets are the inputs in multiple decision prob- 
lems, called in [12] as parallel inference problems, with the most common type being that of 
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multiple hypotheses testing. In the latter problem, for the mth gene, there is a null hypoth- 
esis Hmo and an alternative hypothesis Hmi for which a choice is to be made based on DATA. 
These types of problems have spurred considerable research activity among researchers, no- 
tably statisticians, since in performing multiple decision-making there is a crucial need to 
be cognizant and cautious of the Hyde-ia.n nature of multiplicity, though other procedures, 
especially those with an empirical Bayes flavor [12], have exploited the Jekyll-ian potentials 
of multiplicity |17]. In multiple hypotheses testing this entails holding a tenuous balance 
between two competing desires: control the rate at which correct null hypotheses are erro- 
neously rejected, but maintain the ability to discover correct alternative hypotheses. 

Similar to classical single-pair hypothesis testing, an error committed when a correct null 
hypothesis is rejected is referred to as of Type I, while one committed when a false null 
hypothesis is not rejected is of Type II. There are several types of Type I error rates in the 
multiple testing scenario; see, for example, |8] and j9]. In this paper we concern ourselves with 
the weak family wise error rate (FWER), which is the probability of rejecting at least one 
null hypothesis when all the nulls are correct; the strong FWER, which is the probability of 
rejecting at least one correct null hypothesis; and the false discovery rate (FDR), introduced 
by [13] and [T], which is the expected proportion of the number of false rejections of nulls 
relative to the number of rejections. We follow the usual convention where rejection of a null 
hypothesis is called a discovery. On the other hand, the Type II error rate of interest will 
be the missed discovery rate (MDR), which is the expected number of false non- rejections 
of nulls. There are other Type II error rates in the multiple testing setting that have been 
considered in the literature; see, for instance, [H], [IS], [Z], [H], and [S]. We justify our focus 
on the MDR in Section H 

Analogous to classical hypothesis testing, in multiple hypotheses testing the commission 
of a Type I error is considered more serious than that of a Type II error. Therefore, one 
framework in the development of multiple decision functions requires that one control a 
chosen Type I error rate at a pre-specified level, while making the MDR, or another Type 
II error rate, small, possibly minimal. For example, a procedure that controls the weak 
FWER, under an independence assumption among the genes, is the Sidak procedure [13] : 
while a more conservative one, but which does not require the independence condition, is the 
Bonferroni procedure [1]. For control of the FDR, the procedure introduced by Benjamini 
and Hochberg in their seminal paper [T], hence referred to as the BH procedure, achieves 
the desired control. Other works have dealt with related Type I error measures to the FDR. 
The papers [H], [lOl Hi], and [53] discussed controlling the mFDR, an error rate that is 
asymptotically equivalent to the FDR in some settings; see [17] . On the otherhand, [HI |19] 
dealt with the pFDR, also similar to the FDR and related to the local FDR in [12] , the latter 
having a Bayesian justification. Some other papers, such as [H], [27], and [21], focused on 
the estimation of the proportion of correct null hypotheses. 

Many multiple hypotheses testing procedures, such as the Sidak, Bonferroni, and BH 
procedures, rely on the set of p- values of the individual tests. The validity of these p- value 
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based multiple testing procedures is anchored on the technical requirement that each p- value 
statistic is stochastically equal to or greater than a standard uniform variable under the null 
hypothesis. This requirement, however, is not satisfied with non-continuous variables, or 
when nonparametric tests, such as the Mann- Whitney- Wilcoxon two-sample test, are used. 
It is also not apparent whether such p-value based multiple testing procedures are utilizing, 
if at all, the power functions of the M tests, since many of them sets a single threshold, and 
genes whose p-values are smaller than this threshold are declared discovered. This approach 
may be acceptable in exchangeable settings, but perhaps not in situations where genes or 
subclasses of genes have different structures. See, for example, the papers [13] where there 
are subclasses of the genes possessing different structures, [15] where external covariates 
lead to non-exchangeability, and [31] which deals with settings with heterogeneous p-value 
distributions. These are situations where individual tests have different powers. Since power 
functions are germane for control of Type II error, would it not be the case that by imposing 
a common threshold on the p-values, the ability to control the Type II error rate will be 
compromised? 

These are the motivating issues and questions for this paper. We examine these issues 
in a decision-theoretic framework allowing for general data types and structures enabling 
results to be applicable even for discrete or mixed data and with rank-based nonparametric 
tests. We exploit the power functions of the individual tests to develop optimal or improved 
procedures that control, in weak and strong senses, FWER or FDR. The procedures also 
possess smaller Type II error rates. We surmise that it is more the rule, rather than the 
exception, that in multiple hypotheses testing, the individual tests will have different power 
traits, owing to varied distributional characteristics among the X^s. This could be due to 
differing variabilities of observed variables, differing effect sizes of interest, and possibly the 
use of different tests as dictated by the data type or structure, such as when some of the tests 
are t-tests, chi-square tests, analysis-of-variance F-tests, or nonparametric tests. It could be 
that the usual assumption of exchangeability of the M genes is more untenable than tenable. 
As a consequence, multiple testing procedures relying on the usual p-value statistics with a 
single-threshold rejection rule, such as the Sidak, Bonferroni, and BH procedures, cannot be 
expected to exploit differences in the powers of the individual tests. 

There are papers dealing with multiple testing procedures which improve on single- 
thresholding procedures. Spj0tvoll [IB], dealing with simple null and alternative hypotheses, 
maximized the average power among the M tests subject to a constraint on the expected 
number of false discoveries. Westfall et al. |58j maximized the power in replicated clinical 
trials involving multiple endpoints, where an adjustment is made on the significance levels of 
the individual tests for each of the multiple endpoints under an FWER constraint. Since the 
optimal solution relies on knowledge of the noncentrality parameters, a Bayesian approach 
was used to get a handle on these noncentrality parameters. There were also papers that 
approached the problem through the notion of weighted p- values such as [18j, [55j, [36j, [25j, 
and [21]. In some of these papers, the optimal weights of the p- values were estimated with the 
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aid of prior information about the distributional parameters under the alternatives. Inter- 
esting approaches toward the search for optimal multiple hypotheses testing procedures were 
those in Storey [50] and Storey et al. [51] , where a Neyman-Pearson approach was invoked to 
obtain compound procedures, and Sun and Cai [53], where oracle and adaptive compound 
rules were developed. Compound procedures are characterized by information borrowing 
from each of the genes, so a decision function for a specific gene will utilize information from 
other genes. Decision-theoretic and Bayesian approaches were also implemented in [21] , [SH] , 
[12] . |12j . and [19]. More recently, Efron [13] called for separate subclass analysis, while 
Ferkingstad, et al. [15] proposed the use of external covariates, with these papers employing 
Bayes and empirical Bayes approaches. 

In their pioneering work, Neyman and Pearson [32] demonstrated that the most basic 
and fundamental type of hypotheses in the single-pair testing problem is that with simple 
null and alternative hypotheses. Their Fundamental Lemma, which revealed the existence 
and uniqueness of a most powerful (MP) test function, opened the doors to optimal classes 
of test functions in more complicated settings, leading to classes of test functions possessing 
properties such as uniformly most powerful (UMP), UMP unbiased, or UMP invariant, and 
led to the exploitation of the monotone likelihood ratio (MLR) property. Lehmann [25] 
provides a comprehensive account of the Neyman-Pearson framework of hypothesis testing, 
a framework which dictates that in the search for optimal test functions, the role of the 
power function is central and paramount. This framework also led to the divorce from the 
purely significance or p- value based approach to hypothesis testing which was then dominant 
during the first quarter of the 20th century. 

It appears that, in a parallel manner, we are in the same juncture for the multiple hy- 
potheses testing problem as almost a century ago. Many current multiple testing procedures 
are p- value based and do not exploit the power functions of the individual tests. It behooves 
to examine if better multiple testing procedures will arise by utilizing the individual power 
functions, in parallel to what Neyman and Pearson did in the single-pair hypothesis testing 
problem. This paper is an attempt in this direction. By considering the most basic, but we 
consider the most fundamental, setting in this multiple hypotheses testing situation, we will 
study multiple decision functions in situations where, for each gene, the null and alternative 
hypotheses are simple. This is also the setting considered recently in Roquian and van de 
Wiel [31]. In the search for multiple decision functions this will allow as starting point the 
most powerful test for each of the M pairs of hypotheses, with the test's existence guaranteed 
by the Neyman-Pearson Lemma. Each of these MP tests will have a power, but we will see 
that it is beneficial to look at each of these powers as functions of their MP test's size. These 
functions are the so-called receiver operating characteristic (ROC) functions. 

We outline the contents of this paper. In Section [2] we present the decision-theoretic 
framework which will serve as a platform for obtaining the multiple decision functions. This 
entails describing the probability models, an independence condition underlying the model, 
relevant loss functions, multiple decision functions, and risk functions. The Type I and 
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Type II error rates will be informed by the choice of loss functions. We also justify the 
choice of the MDR as the Type II error rate of interest. Section [3] provides a review and 
re-examination of most powerful tests and p-values. It will also examine properties of the 
ROC function, which will become central in later developments. We utilize the results of 
Section |3] to find the optimal weak FWER-controUing procedure in Section HI The existence 
will be addressed in subsection 14. 2^ whereas subsection 14.31 will deal with the uniqueness. 
Subsection 14.41 provides an explicit method for determining the optimal solution when the 
ROC functions are differentiable. Subsection 14. 51 illustrates the theory for a specific concrete 
multiple testing situation involving normal distributions. In this example we show the gain in 
efficiency of the optimal weak FWER-controUing procedure relative to the Sidak procedure. 
Subsection 14.61 discusses an interesting feature of the weak FWER-controlling procedure 
which bears on how to optimally invest the overall FWER-size to each of the tests and point 
out a tangential manifestation of the strategy during the 2008 Presidential election in the 
United States. Section |5] discusses limitations, extensions, and connections of the problem 
considered. In subsection 15.11 the restriction of the optimization problem to the class of 
simple procedures is discussed in relation to those in |50l |16]. In subsection 15.21 extensions 
to situations with composite hypotheses when a monotone likelihood ratio property holds is 
indicated. The role of effect sizes is also discussed and some strategies are indicated when 
the alternative hypothesis probability measures or effect sizes are not known. Subsection 
15.31 relates the weak FWER-controlling optimal procedure to p- value based procedures and 
distributions of value statistics. 

Section [H] develops an improved procedure which strongly controls the FWER, whereas 
Section [7] develops an improved procedure which strongly controls the FDR. These new 
procedures possess better Type II error rate performance than the sequential Sidak and BH 
procedures. The development of these new procedures is anchored on the weak FWER- 
controlling optimal procedure in the earlier sections. It is shown that the sequential Sidak 
and BH procedures are special cases of these more general procedures. 

In Section [H] we provide a modest simulation study demonstrating that the new FDR- 
controlling procedure improves on the BH procedure with respect to the MDR for the normal 
model considered in the simulation. Concluding remarks are provided in Section [91 

2 Mathematical Setting 

Let J-", P) be the basic probability space on which all random entities are defined and 
Ai = {1,2,..., M} be an index set, with M a known positive integer. For each m & A4, 
let Xm '■ {^,J^) — {'^m,Bm), whcrc is some space with associated cr-field of subsets Bm- 
Form the product space {X, B) with X = Xm^j^X^ and B = a {Xm^^M^m), so that 

X = {X,,X,,...,Xm): {Q,T)^{X,B). 
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The induced probability measure of X is Q = PX ^, while the (marginal) probability 
measure of is Qm = PX~^ , which is also 

QmiBm) = Qi^i X ... X X^^i X X X^+l X ... X Xm), VS^ e ^m- 

For each m e A4, let Qmo and Qmi be two known probability measures on {Xm, Bm)- In 
conjunction with first treating the case with simple null and alternative hypotheses for each 
m G M., we assume that Q belongs to Q, the collection of all probability measures on {X , B) 
whose marginal probabihty measures Q^s satisfy Qm G {Qmo, Qmi} for each m G A^. Let 



be defined according to 9m{Q) = I{Qm = Qmi}, where /{■} is the indicator function. The 
vector 6{Q) is the state of the marginal probability measures of Q. Define, for each Q E Q, 
the subcollections 

Mo = Mo{Q) = {meM:eUQ) = 0}; (2.1) 
Mi = MiiQ) = {meM:emiQ) = l}. (2.2) 

In this paper we shall impose an independence condition given by: 

Condition (I): {X^, itl G A^o(Q)) is an independent collection of random entities, that is, 

\/Bm G Bm, 

Q{>^m£Mo{Q)Bm) = Y\. Qm{Bm)- (2.3) 

mGA4o(Q) 

The collection {Xm,m G J^i{Q)) need not be an independent collection, but we do 
assume that this collection is independent of {Xm, m G A^o(Q))- Two extreme subcollections 
of Q are 

Qo = {QeQ: OmiQ) = 0, Vm G M}; (2.4) 

Qi = {QeQ: OmiQ) = l, Vm g M}. (2.5) 

By condition (I), Qq is a singleton set. We denote by Qo its element. On the otherhand, Qi 
need not be a singleton set. 

Stated in its most basic form, the decision problem is to determine A^o(Q) and A^i(Q) 
based on X. This can be restated as a multiple hypotheses testing problem where one is 
interested in simultaneously testing, based on X, the M pairs of hypotheses Hmo '■ Qm = Qmo 
versus Hmi '■ Qm = Qmi for m e Ai. These pairs of hypotheses could also be stated in terms 
of the ^-vector via Hmo '■ Om{Q) = versus Hmi '■ OmiQ) = 1- 

We approach this problem in a decision-theoretic framework which is somewhat similar 
to that in Sarkar, Zhou and Ghosh [Ml. The elements of this framework are as follows. 
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The action space is ^ = {0,1}^ with generic element a = (oi, 02, . . . , )* G A with 
the interpretation that = (1) means that Hmo is accepted (rejected). The parameter 
space is Q, though the effective parameter space is 6 = {0, 1}*^ with generic element 6 = 
{9i, 02, ... , OmY. For this decision problem, we introduce several loss functions, L : Ax Q ^ 
3?+, defined via 



Lofc(a,Q) = I{a\l-e{Q))>k},k = l,2,...,M; (2.6) 

' a\i-em ' 
an 

L2{a,Q) = (l-ayeiQ), (2. 



Li{a,Q) 



/{a*l > 0}; (2.7) 



with the convention that 0/0 = and 1 is an M x 1 vector of Is. The interpretations of 
these loss functions are as follows. The loss function Lofc(a,Q) equals 1 if and only if at 
least k false discoveries are committed, so when k = 1, Loi(a, Q) becomes 1 if and only if at 
least one false discovery is committed. The loss Li{a,Q) could be interpreted as the false 
discovery proportion, since it is the ratio of the number of false discoveries and the number 
of discoveries; whereas the loss L2{a,Q) is the number of missed discoveries since it is the 
number of true alternative hypotheses that were not discovered. We focus on this missed 
discovery number since the relevant question is how many correct alternatives (6{Qyi ) were 
missed by using the action a ? See also [34j which essentially uses this loss function to induce 
their power metric. Other types of losses, such as the false negative proportion with 



ia,Q)^ '\, "^r;r /{(l-a)^l>0}. 



(1 - a yejQ ) 

a)*l 

have also been considered in the literature such as in [T7] and 

A nonrandomized multiple decision function (MDF) is a 5 : {X,B) — )■ {A,(t{A)), where 
<7{A) is the power set of A. Such an MDF may be represented by 

6{x) = {6i{x), 62{x), 6m{x)Y, 

where 5m{x) G {0, 1}. In general, each 5m could be made to depend on the full data x instead 
of just Xm. We denote by V the class of all nonrandomized MDFs. A randomized MDF may 
also be considered. Denote by V{A) the space of all probability measures over {A, o-i^A)). A 
randomized MDF is a 5* : {X,B) — )■ {V{A),a{V{A))). For a realization X = x, an action 
is chosen from A according to the probability measure 5*{x). Denote by V* the space of all 
possible randomized MDFs. Clearly, V gV*. It is easy to see that by augmenting the data 
X with a randomizer U ~ U{0, 1), which is independent of X, randomized MDFs could be 
made into nonrandomized MDFs with respect to the augmented data {X, U). Henceforth, V 
will represent all nonrandomized MDFs S{X, U)s based on {X, U). 

For brevity of notation, in the sequel, Pq{/(X, U) G B} and Eglf^X, U)} will represent 
probability and expectation with respect to {X, U) when X ~ Q, t/ ~ f/(0, 1), and X and 
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U are independent. For each S &V and the loss functions defined earher, we then have the 
risk functions 

i?ofc(5, Q) = Eq {Lofc(5(X, f/), g)} , A; = 1, 2, . . . , M; (2.9) 

Q) = Eq {Li(5(X, f/), Q)} ; (2.10) 

R2{5, Q) = Eq {L2(5(X, f/), Q)} . (2.11) 

Given & 5= (5i,(52, . . . .^a/)*, let usiQ) = iTT5iiQ),Tr52iQ), ■ ■ • ,7r5„(Q))*, where ^m iQ) = 
EQ{Sm{X, U)}, be its associated vector of power functions. We may then re-express (12.111) 
via 

R2i6,Q) = il-nsiQ)ye{Q). (2.12) 

In terms of these risk functions, for an MDF 6 ^ V, its weak FWER is FWER(5) = 
Roi{S, Qo), and its weak fc-FWER is k-FWER{6) = Rok{S, Qo) for k > 1. If each Sm depends 
only on and U, by condition (I), 

FWER(5) = 1 - E J J] [1 - Pq,^J5„(X„, U) = l\U}] I (2.13) 

ImeA^ j 

where expectation is over the randomizer U. An alternative formulation when Q = Qo and 
with the mth component of the randomized MDF S* depending only on is to have 
U = {Ui, U2, . . . , Um) consisting of independent and identically distributed U{0, 1) variables 
and independent of the X^s. The mth component 5j^(Xm)(G [0,1]) may be redefined via 
6m{Xm,Um) = I{Um < 6*^{Xm)] , making 5{X , U) = {Sm{Xm,Um),m e MY a nonrandom- 
ized MDF depending on (X, U). In this case, (I2.13P becomes 

FWER(5) = 1 - J] [1 - Pq^J5^(X„, f/„) = 1}] . (2.14) 

The risk function Ri{S,Q) is the false discovery rate (FDR) of S at Q, the error rate in- 
troduced in [I]; while the risk function R2{6,Q) will be called the missed discovery rate 
(MDR) of 6 at Q. The adjective 'rate' may be somewhat misleading since R2{S,Q) takes 
values in [0, |A^i((5)|] instead of [0, 1]; however, this does not cause difficulty since given the 
true underlying probability measure Q of X, |A^i(Q)| is a constant. This risk is related 
to the expected number of true positives (ETP), an error measure used in and [SU] , 
via ETP(5, Q) = |A^i((5)| — R2{S,Q)] see subsection 15.11 for more discussions of these error 
measures. 

In analogy with the Neyman-Pearson framework for testing a null hypothesis versus an 
alternative hypothesis, the risks Rok{S, Q) and -Ri(5, Q) will be viewed as Type I error rates, 
whereas the risk R2{5-, Q) will be a Type II error rate. As Type I errors are considered more 
serious than Type II errors, in the search for good MDFs, Type I error rates are required 
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not to exceed a pre-specified threshold. Subject to this constraint, an MDF is to be chosen 
whose Type II error rate is small, if not minimal. 

Thus, to find an optimal MDF that provides weak FWER control in a subclass Vq (1 V, a. 
threshold a G (0, 1) is specified, and it is desired to find an MDF 6* G Vq with Rqi{6*, Qq) = 
FWER((5*) < a, and such that for any other 5 e Vq satisfying i?oi(5, Qo) = FWER((5) < a, 
we have 

sup R2{S*, Q) < sup i?2(5, Q). (2.15) 

The criterion in fl2.15p is of a minimax fiavor. One may require only that R2{S*,Q*) < 
R2{S,Q*) where Q* is the true, but unknown, probability law of X; however, this condition 
may be too strong so as to preclude a solution to the optimization problem, though see 
Storey [50] (also Section |5]) for a situation which uses a different Type I error and where an 
optimal, albeit an oracle, solution with respect to minimizing R2{6,Q*) is possible. Observe 
that for any 6 eV, using the representation of R2{S, Q) in fl2.12p . 




(2.16) 



Thus, the optimality condition on the MDR in fl2.15p is equivalent to maximizing the sum of 
the powers of the individual components of the MDF, that is, maximizing Ylm&M '^&m{Qmi)- 
Interestingly, if we had innocently standardized the loss function L2{a,Q) in order to take 
values in [0,1] via division by |A^i((5)| = 6{Qyi, the number of correct alternatives, then 
the minimax justification above does not carry through! 

In contrast, for strong FWER control, given the threshold of a, it is desired to find a 
compound MDF, 6* G V, such that 

Roi{S*,Q*)<a (2.17) 

whatever the true, but unknown, probability law Q* of X is, and such that X^^g_A4 '^s^iQmi) 
is large, if not maximal, among all 5 G satisfying Roi{6,Q*) < a. Clearly, analogous 
requirements exist for weak and strong A;-FWER control. However, in this paper, due to 
space limitations, we will not focus on this fc-FWER error rate for k > 1. 

For (strong) FDR-control, a threshold q* G (0, 1) is specified and we seek a compound 
MDF, 6* G V, such that, whatever Q* is, 

Ri{6\Q*)<q\ (2.18) 

and for any other 5 G P satisfying Ri{5, Q*) < q*, we have 

inf V ns* (Q) > inf V ns (Q). 

meA4 meA^ 
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The crucial aspect in strong control, both for FWER and FDR, is that the constraints in 
(12.1 7p and fl2.18p need to hold whatever the unknown Q* is, in contrast to the weak FWER 
constraint where it is only required at Qo G Qq. For more discussion of weak and strong 
control, see [8] and |9], and for discussion of optimality conditions in multiple testing, refer 
to [29], wherein maximin optimality results were established for some step-down and step-up 
MTPs. 

3 Revisiting MP Tests and 2^- Value Statistics 

The initial subclass of V which is of interest is Vq, the subclass of simple MDFs. A simple 
MDF 6{X, U) is one whose mth component 5m depends only on (X^, Um) for every m & Jvl. 
Within Pq we examine the existence, uniqueness, and structure of an optimal MDF that 
controls FWER in the weak sense. This optimal MDF will serve later as anchor to developing 
improved MDFs that strongly control FWER and FDR. These new MDFs will be outside 
the subclass T>q, hence are called compound MDFs. 

But, first, for the single-pair hypotheses testing problem, we introduce the notion of a test 
or decision process together with its ROC function and p-value statistic. We revisit most 
powerful (MP) tests and obtain properties of the ROC function and the p-value statistic 
associated with the MP test process. These properties will be needed in later developments. 

3.1 Decision Processes and ROC Functions 

Let X : {Q,A) be an observable random entity and Q = PX^^ its probability 

measure. Consider testing, based on X, the pair of hypotheses Hq : Q = Qq versus Hi : Q = 
Qi, where Qo and Qi are two probability measures on {X, B). Denote by go and qi versions 
of the density functions of Qo and Qi, respectively, with respect to some fixed dominating 
measure u, e.g., = Qo + Qi, or it could be counting or Lebesgue measure. Recall that a 
test function is a measurable function 6 : {X, B) — t- ([0, 1], a[0, 1]), where a[0, 1] is the Borel 
sigma-field on [0, 1]. Given X = x, 6{x) is the probability of deciding in favor of Hi. Its size 
is as = Eq^^5{X). It is of level a G [0, 1], if as < a. Its power is ns = Eq-^6{X). Recall that a 
test function 5* is most powerful (MP) of level a if as* < a and for any other test function 
S with as < a, we have tts* > vr^. We now introduce the notion of a decision process and its 
associated ROC curve. 

Definition 3.1 ^ collection A = {5^ : rj E [0, 1]} of test functions satisfying the conditions 
that, a.e. [Q], So{x) = 0, 6i{x) = 1, and rj i— )■ Sr^{x) is nondecreasing and right- continuous, 
is a decision process. 

Definition 3.2 For a decision process A = {5^ : ?7 G [0, 1]}, its size function is A/^ : [0, 1] — )■ 
[0,1] and its power function is : [0,1] — )■ [0,1], where A^{ri) = as^ = Eq^^5j^{X) and 



10 



P/^iji) = '^Sn = EQi^r){.X). Its receiver operating characteristic (ROC) curve is ROC{A) = 
GTa.ph{{A/\{rj), p/^{r])) : t] E [0, 1]}. When A^{ri) = r] for all rj G [0, 1], the mapping rj 
Pa{v) ^3 called the ROC function of A. 

The use of the phrase power function in Definition 13.21 may be construed a misnomer 
since we are not viewing this as a function of a parameter as is the usual meaning of the 
phrase. However, for lack of a better name, we adopt this terminology. Also, when the 
decision process A satisfies the condition that \/r] G [0, 1] : A/^{ri) = rj, it is said to be size- 
valid. In the sequel, when convenient notationally, Srj and 6{ri) will be used interchangeably 
to represent 6{-; rj). 

Let L : {X,B) — )■ (9?+, (t(3?+)) be a version of the likelihood ratio function, so L{x) = 
qi{x) / qo{x) a.e. [z/]. Let Go(") and Gi(-) be the distribution functions of L{X) when C{X) = 
Qo and C{X) = Qi, respectively, where C{X) is probability measure of X. For a monotone 
nondecreasing right-continuous function M(-) from into 3?, let 

M-^(r) = inf{x G 3ft : M(x) > r} and AM{r) = M{r) - M{r-). 

The Neyman- Pearson Fundamental Lemma [32] states that the MP test function of level t] 
for testing Hq versus Hi is 

6*iX; v)^6; = I{L{X) > civ)} + l{ri)I{L{X) = c(r/)}, (3.1) 

where c{r]) = Gq\1 - r]) and 7(7/) = {Go{c{r])) - (1 - r/))/AGo(c(r/)). Let U ~ f/(0, 1) be 
independent of X. Redefine 6* via 

5**(X, U; = 5;* = I{6*{X; v) = l} + I{S*{X; r^) = 7(7/); U < ^(v)}. 

This is nonrandomized with respect to (X, U), so with the aid of the auxiliary randomizer 
U, the MP test could always be made nonrandomized. The decision process formed from 
these MP tests, given by 

A* = {S;:r^e[0,l]} = {S;*:r^e[0,l]}, (3.2) 

is referred to as the most powerful (MP) decision process. The power (at Q = Qi) of the 
MP test 5* or S** is 

PA*(r/) ^vr^; =71^.. = l-Gi(c(r/))+7(r7)AGi(c(r/)). (3.3) 

It is well-known (see [2B]) that if vr^. < 1, then as* = rj. We denote by A^* and pa* the 
size and power functions, respectively, of A*. Note that if its* < 1 for all < 1, then the 
mapping rj t— pa*{v) is the ROC function of the MP decision process. 
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Proposition 3.1 The function p^* : [0,1] — )■ [0,1] in Ii3.3\) satisfies Pa*{v) ^ V> concave, 
continuous, and nondecreasing. Furthermore, it is strictly increasing on the set A/'< = {77 G 
[0,l]:pA.(r/)<l}. 

Proof: Recall that Pa*{v) = The first result follows from the unbiasedness property 

of MP tests. Suppose Pa*(-) is not concave. Then there must exist r/i G [0,1], ri2 G [0,1], 
and ^ G (0, 1) such that 

^71"5*(»,i) + (1 - 0^<5*fe) > '^5*(^T,i+(l-0m)- (3-4) 

Consider the test function 6** = ^6*{r]i) + {l-^)6*{T]2). The size of this test is ^r/i + (1 -^)?72, 
while its power is ^7is*(ni) + (1 ~ O^^'fe)- From f l3.4p the power of 6** exceeds that of the 
MP test whose size is ^771 + (1 — C,)f]2- Since 6** has the same size as this MP test, this is a 
contradiction. Thus, 77 TTs*(r)) must be concave, and hence continuous. Furthermore, since 
V < '^S'i-q) with ITS* (I) = 1, it follows by concavity that it is nondecreasing. 

Suppose that on A/'<, Pa*(') is not strictly increasing. Then there exists an 779 G A/'< 
such that for some e > and r < 1, Pa*{v) = whenever 7/ G [r]o — €,t]q\ C [0, 1]. Let 
7/1 = sup{7/ G [0,1] : Pa*{v) = Continuity and nondecreasingness of Pa*(') imply that 
rji < 1. Furthermore, since PA*(1) = 1, then there exists an 772 G (r/i, 1] with 1 > Pa*{V2) > r. 
Since r = Pa* (?7i) < |[pa* (?7o - e) + Pa* (^72)], then on [r/o - e, ^2], Pa* (■) is not concave. This 
contradicts concavity of pa* (■)) so the supposition could not hold. Therefore, pa* (■) is strictly 
increasing on A/'<. || 

3.2 j9- Value Statistics 

We introduce randomized p-value statistics through the framework of decision processes. 

Definition 3.3 Let A = {5^ : 77 G [0,1]} be a decision process, where dr, '■ {X,B) — )■ 
([0, 1], cr[0, 1]). Let U ~ f/[0, 1] and independent of X. The (randomized) p-value statistic 
associated with A is Sa '■ {X x [0, 1], i3 (g) a[0, 1]) — > ([0, 1], o-[0, 1]) with Sa{x, u) = in{{r] G 
[0,l]:u<5rj{x)}. 

When the 6rjS in Definition 13.31 are nonrandomized, Sa{X,U) coincides with the usual 
p-value statistic. However, when the 5^s are randomized, Sa{X,U) is a randomized p-value 
statistic. See also [5] for a more specialized definition of a randomized p-value statistic. Let 
us denote by Hq{-) and -ffi(-) the distribution functions of Sa{X, U) when C{X) = Qo and 
C{X) = Qi, respectively. 

Proposition 3.2 Let A = {S^ : 7/ G [0, 1]} be a decision process. Then, for all s G [0, 1], 

Ho{s) = Aa{s) and Hi{s) = 'Ks{s) = Pa{s). 
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Proof: Using properties of a decision process, specifically the a.e.-[Q] right-continuity of 
Tj I— )■ then a.e.-[Q], for each s G [0,1], {{x,u) : S'a(x, m) < s} = {{x,u) : u < 

5s{x)}. Consequently, for j = 0,1, and since U ~ f^[0, 1] and X ± f/, we have Hj{s) = 
Pq^{Sa{X,U) < s} = Pq^{U < SsiX)} = EqVq^{U < 5s{X)\U} = Eq^X), which 
equals Aa{s) when j = and pa{s) when j = 1. \\ 

Corollary 3.1 For a decision process A, Sa has a standard uniform distribution under 
C{X) = Qq if and only if Aa{vi) = rj for all rj G [0, 1]. 

This result is immediate from Proposition 13. 2[ Note that the conclusion of Corollary 
13.11 holds for the MP decision process A* in (13.21) provided Pa*{v) < 1 for 77 < 1. We also 
highlight the last result in Proposition 13.21 which states that the ROC function pa(-) equals 
Hi{-), the distribution of the p- value statistic under the alternative hypothesis. The p- value 
statistic as defined in Definition 13.31 is general and applicable even with discrete or mixed 
data and with randomized test functions. We refer the reader to [20] for properties of this 
randomized p- value statistic and its use in existing FDR-controlling procedures. 

4 Optimal Weak FWER Control 

Let us now return to the multiple decision problem formulated in Section [2l We first focus 
our attention on the subclass Vq consisting of simple decision functions. 

Definition 4.1 A collection A = (A^ : rn G M), where A^ = [Smiv) • V ^ [0?!]) ^■^ ^ 
decision process on {X x [0, 1]*'^, B ® o"[0, 1]^^), is called a multiple decision process (MDP). 
It is simple if each A^ is simple; otherwise, it is compound. 

Definition 4.2 For a simple MDP A = (A^ : m G A^), its associated multiple decision 
size function is A a = (^a™ : m G A^) and its multiple decision ROC function is pa = 

(PAm : £ M), where A^^ and p^^ are the size and ROC functions, respectively, of Am- 

4.1 Optimization Problem 

Let us suppose that we are given a simple MDP A. Then a multiple decision size vector rj = 
{■r]m : m G M) G A/" = [0, 1]^"^ will determine from A an MDF 6Aiv) = (SmiVm) ■ rn G M)\ 
belonging to Pq- For this MDF, 

i?oi(5A(r/),Qo) = FWER(5A(r/)) = 1 - - AaM] 
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and, for Qi G Qi, R2{Sa{v), Qi) = M - 

P^ra iVm) ■ Let us fix an FWER-tliresliold of 
a G (0, 1). Suppose there exists a multiple decision size vector ri^{a) G Af sucli that 



ri^{a) = argmax. 

Then, Aa(^7a('^)) = i^Amiv'k mi'^)) '■ ^ the optimal multiple decision size vector 

for weak FWER control at a associated with the simple MDP A. The associated optimal 
simple MDF is 5A(AA(??l(a))). 

However, since Hmo and Hmi are both simple, from Section 13. there exists a simple 
most powerful MDP, A* = (A;^ : m e M), where A.;^ = {6*^{i]) : r] e [0,1]) with S^{r)) the 
simple Neyman- Pearson MP test function of size rj for Hmo versus Hmi- Consider the simple 
MDF obtained from A* given by (5m(^A„('7A,m('^))) • ^ ^ ■^) • Then, this simple MDF will 
satisfy the FWER constraint, and by virtue of the MP property of each 5m(^A„('7A m('^))) 
for each m G A^, it follows that 

J2 PArrS^Ajvl,M)) > PA™(AA^(r?l,^(a))). 

This implies that in searching for the optimal weak FWER-controlling simple MDF it suffices 
to restrict to the simple most powerful MDP A*. Without loss of generality, we may assume 
^Aj,(^) = V m G and r] G [0, 1]. The optimization problem amounts to obtaining a 
multiple decision size vector rj*^*{oi) G M satisfying 



argmax^g_^ <^ ^ pA^^iVm) ■ JJ (1 - r]™) > 1 - a > , (4.1^ 



provided that such a multiple decision size vector exists. The optimal weak FWER-controlling 
simple MDF is then 

^w{(^) ^ mvl^,M) -.meM). (4.2) 

We mention two choices for the size vector rj = {rjm \ m E M) satisfying the FWER 
constraint in fl4.ip . The Sidak procedure [33] has 

r]m = Vmia) = 1 - (1 - a)^/^^ m e M. (4.3) 

This guarantees that the FWER is exactly equal to a, though this requires for its validity the 
independence condition (I); see fl2.3p . A conservative choice of t] is the Bonferroni inequality- 
derived choice with 

Vm = Vm{(^) = a/M, meM. (4.4) 

This choice also satisfies the FWER constraint, though equality is not achieved. However, 
it does not require the independence condition (I). 
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4.2 Existence of Optimal Size Vector 



This subsection establishes the existence of an optimal multiple decision size vector for 
weak FWER control when dealing with T>q. As pointed out in the preceding subsection, 
it suffices to look for the optimal weak FWER-controlling simple MDF by starting with 
the most powerful simple MDP A* = (A^ : m G A^). For brevity, we write p„i = Pa^ 
and Ajn{ri) = Aj\*^{ri) = rj. Recall that J\f = [0, 1]*'^, the multiple decision size space. For 
a e [0, 1], define 

_ / {rjeAf: Y.meM log(l - ^ log(l - a)} if a < 1 

- \ A/- if a = 1 ' ^^-^^ 

the weak FWER constraint set. The following provides properties of Cq,. 

Proposition 4.1 Ca satisfies (i) r] = & Ca,' (ii) (0,0;^) G Ca for all m e M, where 
(0,0;^) is the zero-vector with the mth element replaced by a; and (Hi) it is convex and 
closed. 



Proof: The results clearly hold when a = 1 since Ca = M . For a G [0, 1), results (i) and 
(ii) are immediate, while the closedness of follows from the continuity of the logarithm 
function. Let r]i,r]2 G with r^i ^ r]2, and let ^ G (0,1). Since ZlmeA4^og(l ~ ^i"*) - 
log(l — a),j = 1, 2, and the mapping rj h- )■ log(l — rj) is strictly concave, then ^^^j^ log[l — 

{^Vlrr. + (1 - 0^2™)] > e EmeM ^"^Si^ " ^Im) + (1 " En^eM ^"^Si^ " ^2m) > C log(l - «) + 

(1 — ^) log(l — a) = log(l — a). This establishes the convexity of Cq,. || 

Definition 4.3 The upper set 0/770 G A/" is U{rjo) — {rj E Af : rjm > rjorm^rn G M.}. The 
upper boundary set of the constraint set Ca is UB{Ca) — {rj & Af : Car\U{r)) — {r)}}. 

Proposition 4.2 For all a e [0,1), UB{Ca) = {v e M : ^^^j^log{l -r]J = log{l - a)}. 

Proof: Let 77 G UB{Ca) so {r]} = C„ nU{ri). Suppose Emexlog(l - Vm) > log(l - a). 
Then by continuity of the logarithm function, there exists an e > such that 77 + le G A/" 

and T^meM ^ > T^meM ^^^i^ " + ^)\ ^ ^^^(l " ^hus, 77 + le G C„ and 
clearly 77 + le G [^(77). Consequently, 7/ + le G n [/(7/) which contradicts the fact that 
{r]\ ^CariU{rj). Therefore, we must have X)meA4 log(l ~ Vm) = log(l - a). 

On the other hand, let 7; G jV such that J2meM ^'^si^ ~ Vm) = log(l — a). Then rj G Ca, 
and since 7/ G ^7(77), it follows that 77 G Ca^U{ri). Suppose there exists an 771 E M with rji ^ r] 
and r]i E U (77). Then, 771 =77 + 7 with j„i > for all 77^ G with strict inequality for some 

meM. Therefore, J^meM log(l - Vim) = T^meM log(l 'Vm- 7m) < T^meM ^og(l " Vm) = 
log(l — a). This imphes that 771 ^ Ca- Therefore we must have {77} ~ Ca^U{r]), hence 

77 G UB{Ca). II 
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Proposition 4.3 Ub={r]eU: 'ZmeM Pmiv^) > Mb] for b E [0, 1] satisfies (i) r] = 1 E 
Mb, (a) it is closed and convex, and (Hi) Af = Ao ^ A/b^ ^ A/la for < 6i < 62 < 1- 

Proof: From Proposition 13. ![ Vm E A^,pm(l) = 1, lience 1 E Aft- It was also establislied 
in the same proposition tliat rj 1— )■ Pm{v) is continuous, nondecreasing, and concave. That 
A4 is closed follows from the continuity of each pm{-)- The convexity of A4 follows from the 
concavity of each of the Pm(') analogously in the proof of Proposition 14.11 The last result is 
immediate from the definition of A4- || 

Proposition 4.4 Let = {b E [0, 1] : n 7^ 0} for a E [0, 1) and let 6* = supS^. 

Then = [0,b*J. 

Proof: Obviously E Ba, so Ba is nonempty, and hence 6* is well-defined. Let b > with 
b E Ba- Let bi E [0, b). From (iii) of Proposition 14. 3[ A/^^ ^ A4, hence, since MbH Ca 0, 
then A/fei n 7^ 0. Therefore, bi E Ba- Let {6„ : n = 1, 2, . . .} be a sequence in Ba 
such that bn t ^oi each n = 1,2, .. . there exists sua rjn E Af such that rjn E Ca and 
J2meM P"T-(^^"T-) — ^^n- Consider the sequence {rjn} in Af . This is a sequence belonging to 
the closed and bounded set Ca- By the Bolzano- Weierstrass Theorem [35], there exists a 
subsequence {rjn'} of {rjn} such that for some r/o ^ Co, '7n' — ^ '7o- Furthermore, since the Pm(-)s 
are continuous, then EmGAiP™(^om) = lim„'^oo EmeM ^ Mlim„/^oo&n' = 

Therefore, rj^ E Afb^, hence r/o ^ C'a H A/f,.. Thus, 6* E Ba- \\ 

Theorem 4.1 (Existence) Let a E [0,1). Then Ca fl A4j 7^ 0- Furthermore, rj E JV is a 
weak FWER-a optimal multiple decision size vector if and only if rj E Ca H A4* ■ 

Proof: First, observe from Proposition 14.41 that Ca H A/l* 7^ 0. Each element rjo E 
Ca n Afe* satisfies the FWER-a constraint and also achieves the optimal (largest) value 
of '^meM P-m^^^T-) among all rj E Ca- Therefore, rjQ is an optimal size vector for FWER 
control at a. 

Suppose that rjQ is an FWER-a optimal solution but rjo ^ Ca^i A^j . Then there must 
exist a 6 > 6* such that rjo E Afb- Since rjo E Ca, then we have Cq, fl A/j, 7^ 0. But this 
contradicts the maximality of 6* . Hence, the supposition could not be true. || 

4.3 Uniqueness of Optimal Size Vector 

Theorem 14.11 guarantees the existence of an optimal weak FWER multiple decision size 
vector, but does not address whether the solution is unique. We consider this uniqueness 
issue in this subsection. For this purpose, we first define sections of Ca- 

Definition 4.4 Let a E (0, 1) and Ca be the constraint set. The mth section of Ca is the 
subset of [0, 1] given by Ca{rn) = {rjm G [0, 1] : 77 G Ca}- 
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Theorem 4.2 (Uniqueness) Let a G [0, 1). //, Vm G M.,r]m H- PmiVm) is strictly increasing 
on Ca{fn), then the optimal weak FWER-a multiple decision size vector is unique and is the 

T]* with CaHMf,,^ = {V*}- 

Proof: It suffices to show from Theorem 14.11 that Ca fl A/fej is a singleton set. Suppose it is 
not a singleton set. Let rji, 772 G A/" with rji 7^ 772 such that for j = 1, 2, J2meM l°s(-'- ~ Vjm) > 
log(l - a) and E^e^vi pUVjm) > Mb*a. Let ^ G (0, 1) and define r/* = ^r^i + (1 - 0^2- By 
convexity of both Ca and Mb*^, we have 77* G Cq, fl A/;,* . But, due to the strict concavity 
of the map rj ^ log(l - r/), Eme^log(l " V*m) = EmGA4 logfl " (^^1™ + (1 " 0^2m)] > 

e E„.GA4 log(l - ^irn) + (1 - E„.GA4 ^Og{l - r/2™) > log(l - a). ThuS, 7/* G C„ \ f/i?(C,). 

By continuity of the logarithm function, there exists an niQ G Ai and an cq > such that 
EmGA4 log[l-('7m + eo^{"^ = "^o})] > log(l-«). Obscrve that (?7;; + eo/{m = mo} : m e M) 
belongs to both Ca and U{r]*). Since Vm G A/1, ?7m PmiVm) is strictly increasing on 
Ca{m), then Y^m^MPmiVm + eo/{"^ = "^o}) > EmeA4P'»(^m) ^ But this contradicts 

the maximality of 6*. Therefore, Cq nA/;,* must be a singleton set. || 

Corollary 4.1 //, Vm G A^, ?7m G [0, sup 6*0,(77^)) ^ Pm{Vm) < 1, ^/^en i/ie optimal weak 
FWER-a multiple decision size vector is unique. 

Proof: Follows from Theorem 14.21 and Proposition 13.11 since condition implies that for 
Vm G Ai, rjm H- PmiVm) is strictly increasing on Ca{rn). \\ 

Non-uniqueness of the optimal weak FWER-controlling multiple decision size vector may 
occur with non-regular families of densities, such as the uniform or shifted exponential den- 
sities, where the power of the test may equal one even though its size is still less than one. 
It may also occur if the decision processes in the MDP are not size-valid (see paragraph 
after Definition 13. 2p such as with discrete data or nonparametric rank-based methods and 
randomized tests are not permitted. In such situations, the mappings rjm ^ PmiVm) for 
m G A4 need not be strictly increasing leading to a non-singleton set Ca H A4j . 

4.4 Finding Optimal Size Vector 

This subsection addresses the computational problem of finding the optimal weak FWER 
multiple decision size vector. Generally, without differentiability of the ROC functions as 
with discrete distributions, linear or nonlinear programming methods are needed to ob- 
tain the optimal solution. Below we present a method when the ROC functions are twice- 
different iable. 

Theorem 4.3 Let A* = (A^,m G M) be the most powerful MDP. Assume that the ROC 
functions rjm ^ pm{Vm) o,re strictly increasing and twice- differentiable with first and second 
derivatives p'^ and p'^, respectively. Given a G (0,1), the optimal weak FWER-a multiple 
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decision size vector rj* = 77^.(0;) = {ri^{a),m G Ai) is the t] that solves the set of Lagrange 
equations 



Vm G M, p'^{r]m){l - rim) = A, for some A G 3?+; 

EmGA4 l0g(l - Vm) = log(l - «)• 



(4.6) 
(4.7) 



Proof: Since p^s are strictly increasing, by Theorem 14.21 there is a unique optimal solution. 
Form the Lagrange function on (0, 1)*^ x 3? via 



Jiv, A) = X] /'"^('^™) + 1 X] ~ ~ ^^^^^ ~ 



For me M, dJ/dT]m = p'miVm) - A/(l - r]m) and dJ/dX = Y.m=i log(l " Vn. 
Equating to zeros yield conditions (14.61) and (14. 7p . 

To show that the solution of (14.61) and (14.71) is a maximizer of J, we need to verify 
that the sequence of determinants of the principal minors of the bordered Hessian matrix, 
evaluated at this solution, alternates in signs. The second partial derivatives of the La- 
grange function are, for m, n G M., d"^ J / drjmdrjn = {p'mirjm) — X/ {1 — rjm)'^} I{m = n}, 
d'^J/dr]md\ = d'^J/dXdrim = 1/(1 - Vm), and d'^J/dX^ = 0. The solution of dM]) and fITTD 
satisfies A = p^(r7m)(l — rjm), m G Ai. Since p'miVm) > 0, then at the solution, A > 0. 
Furthermore, since by Proposition 13. Ij Pm(') is concave, then p'miVm) < 0. As a conse- 
quence, at the solution, d'^ J / drjmdrjn < 0, d"^ J / drjmdX < 0, and d'^J/dX'^ = 0. The bordered 
(M + 1) X (M + 1) Hessian matrix evaluated at the solution of (14. 6 p and (14. 7p is of form 



log(l 



a] 



H 



Dg(b) 







where Dg(b) is the diagonal matrix with diagonal elements consisting of the elements of the 
vector b and with b* = — (p^(?7m) — A/(l — r/m)^, m G A^) and a* = (1/(1 — r^m), m e M) . 
Observe that all elements of b and a are nonnegative. The mth principal minor of H is 



Dg(b„ 



where b^ 



,6m)* and a^ = (ai,a2, 



Since 



det(H, 



■1) 



m+2 



Kk=l 



k=l 



and the a^s and bkS are nonnegative, then the determinants of the principal minors of the 
bordered Hessian matrix alternate in sign, starting with a negative sign. Consequently, the 
solution of (14.61) and (14.71) is a maximizer of the Lagrange function, and hence maximizes 
Xlmex /^m('7ni) subject to the FWER a-level constraint. || 
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Proposition 4.5 Assume the conditions of Theorem \4.3\ Then, for each m & Ai, the 
mapping a f]^{<y) is nondecreasing and continuous. 

Proof: Define gmiVm) = Pm(^m)(l - Vm)- Since p'^XVm) > by nondecreasing property, 
tlien 5'm('7m) > 0, V?]™ G [0, 1]. Furthermore, g'^{rim) = p^(?7„)(l - r]m) - p'miVm), so since 
PmiVm) < by concavity, tlien g'miVm) < 0. Tlierefore, eacli gm{-) is a nonincreasing function. 
The defining condition of f]* in (14. 6 p is gm{i]m) = A,Vm G Ai, together with the constraint 
condition in (14. 7p . Suppose a increases from ai to a2- This entails that the left-hand side 
of (14. 7p must decrease, but for this to happen, and since each gm{-) is nonincreasing, the 
common value of the Lagrange constant A must decrease. But then each rj^ cannot decrease. 
Thus, for each m E A4, rjmicii) < ''7m («2)- Continuity follows immediately since the existence 
of the second derivative implies that p'^ is continuous. || 

The monotonicity in Proposition 14.51 is a desirable property since it implies that if at 
FWER size ai, we have 5m(^m(tti)) = 1, then at an FWER size 02 with 02 > «i, "we also 
have Sm{f]m{ci2)) = 1- This will also be critical in proving a martingale property needed 
for the development of the FDR-controlling procedure. Further examination of the proofs 
of Propositions 14.31 and 14.41 and Theorem 14.11 also reveals that, given any MDP A whose 
ROC functions are nondecreasing, concave, and continuous, there exists an optimal FWER- 
controUing multiple decision size vector specific to this A. Our starting with the most 
powerful MDP A* was in order to get the optimal FWER-controUing MDF among all MDFs 
in Vq. This observation will play an important role in extending our results to situations 
with composite hypotheses. The basic idea is to start with a collection of MDPs, find the 
optimal weak FWER-controUing multiple decision size vector for each MDP, then choose the 
best among the optimal MDFs arising from each of these MDPs. 

4.5 An Example for Weak FWER Control 

In this subsection we demonstrate the weak FWER-controlling procedure by considering a 
concrete situation with normal distributions. An earlier version of the manuscript also dealt 
with exponential distributions and binomial distributions. However, due to space constraints, 
we do not present these other examples. 

For m G A^, let Xm ~ N{prn,o''^o)y where the unknown and c'^qS are known. 

Consider the multiple hypotheses testing problem if^o '■ Pm = Pmo and H^i '■ Pm = Pmi 
with pmo < Pmi for m G A^. The MP test of size rj^ for H^q versus Hmi is 

6l^{Xm] Vm) = ^miVm) = I{^m > PmO + <7mO^'^ {I " Vm)} , (4.8) 

where $(■) and $~^(-) are the cumulative distribution and quantile functions, respectively, 
of a standard normal random variable. The mth effect size is 7™, = {pmi — Pmo)/o'mo, and 
the ROC function of the decision process A,^ = {S^{vm) '■ Vm ^ [0, 1]) is 

PmiVm) = PmiVm] 7m) = ^{im - [l - Vm)) , (4.9) 
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clearly twice-differentiable with respect to 1]^- With (f){z) = exp(— 2;^)/-\/27r = $'(2;) being 
the standard normal density function, the derivative of Pm{-) is 

(Pm) [Vm) = _ • (4.10) 



For fixed a G (0, 1) and 7^8, consider the mappings d (-)■ rim{d),m G A^, defined implicitly 
by the equation 

(4.11) 



-Hi-^J) 

The optimal value of d, denoted by d*, solves the equation 

J2 log(l - Vm{d)) - log(l - a) = 0. (4.12) 



The optimal sizes of the M MP tests are then rim{d*)^m G M.. An R [22] implementation of 
this numerical problem first defines f ^ = 1 — $^^(1 — 77m), so condition (14. lip amounts to 
solving for Vm = Vm{d) the equation 

log ^Vm) + ImVm ' log(rf) - 7™/2 = 0. (4.13) 

The R implementation utilized a Newton-Raphson iteration in solving for f^s in (14.131) and 
the uniroot routine in the R Library to solve for d in (14.121) . Upon obtaining the Vm{d)^-, 
the rjm{d)s are computed via rjm{,d) = 1 — ^{ym{,d)). 

Figure [1] demonstrates the optimal sizes for different effect sizes when M = 2000 for 
uniformly distributed effect sizes. Observe that when the effect size is small, which converts 
to low power, then the optimal size for the test is also small, but also note that when the 
effect size is large, which converts to high power, then the optimal test size is also small. 
For the tests with moderate effect sizes or power, then the optimal sizes are higher. This 
behavior could also be seen by looking at the second panel in the figure which shows the 
achieved power of the tests at the optimal sizes, and in Table [1] which contains results for 
small values of M. 

We also compared the efficiency of the optimal procedure relative to the Sidak procedure. 
The measure of efficiency is the ratio (multiplied by 100) of the average power over the M 
tests, defined by Xlmex ^^(^7™)/^' optimal procedure and the average power of the 

Sidak procedure. The fourth panel in Figure [T] depicts the powers of the resulting tests 
versus the effect size for both procedures (solid blue = optimal; dashed red=Sidak). In these 
uniformly-generated effect sizes, the efficiency of the optimal procedure over the Sidak is 
103.5%. This efficiency is affected by the vector of effect sizes. For instance, when we change 
the effect sizes in Figure [1] to be generated from a uniform over [.1,2], then the efficiency 
jumps to 181.7%, though it should also be pointed out that since the effect sizes are small, 
then the overall powers of both procedures are also small. 
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Table 1: Optimal test sizes under normality for different power /effect size configurations. 
The configuration and the optimal sizes are described by the notation k : {a,b, ...) which is 
interpreted as having k of each of the elements in the vector {a,b, ...). The relative efficiency 
(in percent) of the optimal procedure relative to the Sidak procedure is also presented. 



Effect Size, 7, 


Optimal Test Sizes/ [Efficiency over Sidak (in %)] 


Configuration 


M = 4 


M = 20 


M : 1 


4 : .0127 


20 : .0026 




[100.0] 


[100.0] 


M/2:(.5,l) 


2 : (.0009, .0245) 


10 : (0,.0051) 




[113.6] 


[125.1] 


M/2:(l,2) 


2 : (.0050, .0204) 


10 : (.0001, .0050) 




[104.5] 


[115.3] 


M/2:(l,5) 


2 : (.0228, .0026) 


10 : (.0035, .0016) 




[103.6] 


[100.3] 


M/4: (0.5,1,2,4) 


1 : (.0001, .0128, .0303, .0075) 


5 : (0, .0003, .0068, .0031) 




[105.4] 


[107.1] 


M/4: (1,2,4,8) 


1 : (.0128, .0304, .0075,0) 


5 : (.0003, .0068, .0031,0) 




[105.0] 


[104.3] 
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Figure 1: Optimal test sizes and powers for 2000 MP tests of hypotheses under normahty 
when the effect sizes were generated from a uniform[.l, 10] distribution. Panel four shows 
the powers for both the optimal [solid black] and the Sidak [dashed red] tests with respect 
to effect sizes. 
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4.6 A Size-Investing Strategy 



In this concrete example we observed from Figure [T] and Table [T] the phenomenon where, 
among the M tests, those with low powers (small effect sizes) and those with high powers 
(large effect sizes) are allocated relatively small sizes in the weak FWER-controlling optimal 
procedure. The tests getting the larger sizes are those with moderate powers or effect sizes. 
We refer to this as a size-investing strategy in the multiple hypotheses testing problem. The 
theoretical basis for this strategy, at least under the conditions of Theorem 14. 3[ is the first 
condition for optimality in (14. 6p . which is tied to the rates of change of the ROC functions 
of the MP multiple decision process, together with a penalty incurring from larger sizes. 

This strategy can be explained intuitively. With the overall goal of getting more real 
discoveries while controlling the proportion of false discoveries for a pre-specified, usually 
small, overall size a, the optimal procedure dictates that not much size should be accorded 
those tests with either very low or very high powers. The former case will not lead to any 
discoveries anyway if the size that could be allocated is small, while the latter case will 
lead to discoveries even if the test sizes are made small. Thus, there is more to be gained 
by investing larger sizes on those tests that are of moderate powers, and an appropriate 
tweaking of their test sizes according to condition (14. 6 p improves the ability to achieve more 
real discoveries. However, this phenomenon is dependent on the magnitude of the overall 
size. If this overall size is made larger, then more leeway may ensue to the extent that it 
may then be more beneficial to allocate more size also to those with low powers since those 
tests with moderate powers, when they had small sizes, may now have larger powers because 
of the consequent increase in their sizes. The precise and crucial determinant of where the 
differential sizes should be allocated are the rates of change of the ROC functions, with some 
size- attenuation. See also interesting discussions of size and weight allocation strategies in 
[58] , where the size allocation was related to the 'a-spending' function of |26j, and fT6] which 
deals with a-investing in sequential procedures that control expected false discoveries, as 
well as [IHl El] which discuss optimal weights for the p-values. 

Interestingly, a tangential real-life manifestation of this size-investing strategy occurred 
during the 2008 American presidential election, with the total resources (financial, manpower, 
etc.) available to the candidates analogous to the overall size in the multiple testing problem. 
In the waning days of the campaign, the major presidential candidates, then-Senator Barack 
Obama of the Democratic Party and Senator John McCain of the Republican Party, focussed 
their campaign efforts, in terms of allocating their financial and manpower resources, in the 
'battleground states' of North Carolina, Virginia, and Pennsylvania, while basically ignoring 
the 'in-the-bag states' of South Carolina, then expected to vote for McCain, and California, 
then expected to vote for Obama. Also, by virtue of the deep resources of the Obama 
campaign, it was able to allocate more resources even in states that traditionally voted 
Republican, whereas the McCain campaign, with a relatively smaller war chest, had to 'drop' 
some states (e.g., Michigan) in their campaign. These opposing behaviors of the two camps 
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could be explained by the size-investing strategy with proper accounting of each campaign's 
overall resources. 

5 Restrictions, Extensions, and Connections 
5.1 On the Restriction to Vq 

The optimization problem for weak FWER control could be construed as limited since we 
restricted our search for the optimal MDF in the class Vq consisting of simple MDFs, even 
though the mth component of the MDF is the MP test function for Hmo versus Hmi based on 
{Xm, Urn)- The rcsultiug optimal weak FWER-controUing procedure is still therefore simple. 
Storey [50] and Sun and Cai [53] have argued and demonstrated that performance could be 
improved in multiple decision problems through compound MDFs. These are characterized 
by the phenomenon that in testing the mth pair of hypotheses, information is borrowed 
from the other components of the data vector X = {Xm : m E Ai), analogous to the James- 
Stein p3] shrinkage phenomenon. An example of a compound MDF is the estimated optimal 
discovery procedure (ODP) in [50] and [51], though the ODP by itself as defined in Lemma 2 
in [Sn] is not yet compound in the sense defined above, but it does use the densities of all the 
components to form the significance thresholding function. Other examples of compound 
MDFs are the FDR-controlling procedure in [Ij and the oracle-based adaptive MDFs in [53]. 

A question arises whether we could start immediately with compound MDFs to search 
for an optimal weak (or strong) FWER-controlling compound MDF. Thus, suppose that 
S = i^m : m G A^) is a compound MDF so that 6^ depends on {X,U) and not only on 
{Xjn, Urn)- For such an MDF, we have 

RoA5,Q) = PQ{u„eA^„(Q)[(5^(X,f/) = l]} (5.1) 
= l-PQ{n„6^,(Q)[5„(X,[/) =0]}. (5.2) 

Observe that even if the independence condition (I) holds, {5m{,X,U) : m G A^o(<5)) need 
not be an independent collection. As such no closed-form exact expression for i?oi (5, Q) need 
exist. Certainly, the right-hand side expression in fl5.ll) could be Bonferroni-bounded by 

EFP(5,g)^ «,,„(g), (5.3) 

mGA^o(Q) 

called the expected number of false positives in [SU]. Alternatively, if a generalized positive 
quadrant dependence (PQD) condition holds among these components, which states that 

Pq {^meM,[Q) [5m (X, f/) = 0] } > J] ^^^^^^ U) = , 

meMo(Q) 
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then the right-hand side in fl5.2p could be upper-bounded by 



PQD(5,Q)^1- H [l-asM)], (5-4) 

meMoiQ) 

where as„XQ) = EgSmiX, U), the size of 6^ when m G Aio{Q). For this compound MDF, 
its MDR is 

R2{S,Q)= Yl l^-^sJQ)], (5.5) 

m€Mi{Q) 

where TiSr^^iQ) = EgSm^X, U) is the power of 5m when m G M.i{Q). 

An optimization approach is to put an upper threshold of a G (0, 1) on either fl5.3p or 
f l5.4p . and to obtain an MDF 5 minimizing R2{5-, Q), or, equivalently, maximizing ETP(5, Q) = 
SmGMi(Q) (Q)' ^ quantity referred in [SO] as the expected number of true positives. 
Spj0tvoirs [16] optimal procedure is the simple MDF defined by 

5spAa) = argmax^e^^, {ETP(5, Qi) : EFP(5, Qo) < «} , (5.6) 

where Qo G Qo and Qi E Qi. On the other hand, Storey's [50j ODP is defined via 

SsToia; Q) = argmax^^^ {ETP(5, Q) : EFP(5, Q) < «} , (5.7) 

where Q is the true underlying probability measure of X. His use of the EFP as Type I error 
measure enabled a calculus of variations optimization to obtain the form of the optimal MDF. 
His procedure has a particularly interesting structure when we utilize as its input the vector 
of p- value statistics {S^{xm, Um) '■ ^ -M) arising from the most powerful multiple decision 
process A* = (Aj^ : m G A^) with multiple decision size function A^, = {{A^{ri) : r] G 
[0, 1]) : m G M} and multiple decision ROC function p\, = {{Pm{r]) : r] e [0,1]) : m e M} 
and with A^{-) and p^(') being both differentiable with respective derivatives (y4^)'(-) and 
(Pm)'(')- Defining the so-called significance thresholding function S : ([0, 1], a[0, 1]) — t- 3ft via 

^ V (A* VisV ^^-^^ 

with this expression following directly from Lemma 2 in [50] and Proposition 13. 2[ then 
^STO = {5m,STO : m E Ai) has the structure 

Sm,STo{S^{Xm, Um)] Q) = I{S{S^{Xm, Um)] Q) > \} , m E M, (5.9) 

for A G [0,00) chosen so the size constraint on EFP {6 sTo{ci',Q),Q) is approximately satis- 
fied. Observe that each of the components in (15.90 is still of simple-type, that is, the mth 
component depends only on {xm, Um), unless the cut-off A is determined in a data-dependent 
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manner using the full data (x, -u). Note also that 5sto "was derived under complete knowl- 
edge of the unknown Q, or more specifically, the sets A^o(Q) and A^i(Q), as can be seen 
in flS.Sp . For the simple null versus simple alternative case, the size functions A*^{-)s and 
the ROC functions will be known, but with composite hypothesis they may be un- 

known. In order to implement 5sto^ it was proposed in |50] and |5T] that these unknown 
quantities, sets, functions, or the significance thresholding function, be estimated based on 
the data (x, u). But, by doing so, which now makes the MDF of compound-type, the exact 
optimality property of the ODP need not hold anymore, though it could be argued that if 
good estimates are utilized then the resulting estimated ODP will have desirable properties. 
See [53] and [15] for an interesting discussion of the ODP procedure. In contrast, note that 
5spj is determined only by the extreme probability measures Qq and Qi whose marginal 
probability measures, Q^s, are completely known, and not by the unknown true probability 
measure Q. This fact was criticized by Storey [50] as a 'potentially problematic optimality' 
criterion. More importantly, it should be recognized that both 5spj and 5sto are not nec- 
essarily the optimal weak or strong FWER- or FDR-controlling MDFs since the Bonferroni 
upper bound for -Roi(^) Q) utilized in the derivations is hardly a sharp upper bound. 

The criticism leveled against 5spj could also be invoked against our optimal weak FWER- 
controUing procedure since in our optimization we also relied on a criterion that was deter- 
mined only by the extreme probability measures Qo and Qi. Note, however, that each of 
the components of the optimal weak FWER-controlling multiple decision size vector, and 
consequently each of the components of 5^ (a), uses all of the QmoS and Qmis, analogously 
to Storey's ODP, though the MDF 5^{a) is at this point neither adaptive nor compound. 
Our development of this simple MDF, which is optimal in the class is a prelude to the de- 
velopment of our adaptive and compound MDFs sirong'/?/-controlling FWER and FDR. The 
MDF 5^ (a) serves as anchor towards the development of these FWER and FDR strongly- 
controlling compound MDFs. These new MDFs will be discussed in Section |6] for the strong 
FWER-control and in Section [7] for the strong FDR control. One may characterize our 
approach to obtaining these strongly-controlling MDFs as indirect, whereas Storey's f5Dj ap- 
proach maybe viewed as a more direct approach. There is also an intrinsic difference in the 
problems considered since we are focussing on the Type I error risk functions i?oi and 
whereas in [50] and [16] the simpler Type I error metric of expected number of false positives 
(EFP) was utilized. Looking forward, though our starting point is still the optimal weak 
FWER-controlling simple MDF 5^{a), there is confidence in the viability of this indirect 
approach to generate good MDFs since, as will be demonstrated later, the sequential Sidak 
procedure and the BH procedure are special cases of the strong FWER- and FDR-controlling 
compound MDFs, both arising under exchangeability. 
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5.2 Families with MLR Property 



The initial simplification of the multiple decision problem to the simple null versus simple 
alternative hypotheses for each m G may be perceived as a serious limitation since the 
optimal MDF will then depend, and hence requires knowledge, of the Qmis to calculate the 
ROC functions. In settings with "large M, small n" data sets, such knowledge may not be so 
forthcoming. However, we point out that the simplistic approach of simply assuming, most 
probably erroneously, that the {Qmo,Qmi) is invariant in m G A^, which is the exchange- 
able setting, will have undesirable consequences; see [13]. Historically, in the development of 
optimal procedures, such as in the Neyman-Pearson hypothesis testing framework, it is pru- 
dent to start with the simplest setting, which usually turns out to be the most fundamental 
setting. In our situation this is the case with simple null and simple alternative hypotheses. 
This approach was also implemented in [IS] , [50] , and [31] • Recall that in the development of 
optimal classes of test functions in the single-pair hypothesis testing problem, the role of the 
MP test is centrally crucial. We surmise that in the multiple decision problem, the solution 
to the simple null versus simple alternative hypotheses problem will play a prominent role 
in solving the composite hypotheses setting. It appears that for an MTP to possess opti- 
mality, it will require knowledge, either exact, approximate, or estimated, of the alternative 
hypotheses distributions; see also [M] . We briefly touch on this aspect in the presence of the 
monotone likelihood ratio (MLR) property, see [28] . 

Consider the situation where, for each m E A4, the density function belongs to a 
one-dimensional parametric family J^i = {imi-] ^m) '■ Cm £ Tm C 3?} which possesses the 
MLR property. A typical pair of hypotheses to be tested would be H^q : < Cmo versus 
^mi '■ Cm > C,mo, where Cmo is known. With the MLR property, a uniformly most powerful 
(UMP) test function 6m{Xm, Um] Vm) of size r]m exists, with this UMP test identical to the 
MP test of size rjm for the simple null hypothesis Hmo '■ Cm = Cmo versus the simple alternative 
hypothesis Hmi '■ Cm = Cmi, with ^^i > Cmo- When dealing with the single-pair hypothesis 
testing problem, we recall that exact knowledge of the value of Ci is not necessary since the 
critical constants of the size- 77 MP test for iJo ^ C = Co versus Hi : ^ = Ci can be made 
independent of Ci- In contrast, for the multiple decision problem, to determine the optimal 
size allocations for each of the M MP tests, for a given overall weak FWER threshold, the 
powers of the tests at the Cmis are required, hence the need to know the values of the Cmis. 
We propose two possible solutions to this dilemma. 

The first approach is to solicit from the scientific investigator the values of the Cmis 
for which the powers are of most interest. Such values may coincide with those that are 
scientifically, e.g., clinically, different from the Cmos. Such elicitation, which may not be very 
feasible in practice if M is large, but which may be made possible by forming subclasses 
or clusters of the M genes as in [13], amounts to specifying ejfect sizes, analogous to that 
in sample size determination problems. Formation of such clusters must be made in close 
consultation with the investigator, or perhaps guided by the result of a preliminary cluster 
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analysis using data independent of that used in the decision functions. For the specified ^mis, 
the ROC functions in the determination of the optimal weak FWER-controlling multiple size 
vector become Pmiji) = T^sj^{r{){imi) for m G M., where S^irj) is the simple MP test of size 
7] for testing Hmo : im = ^mo versus Hmi : = ^mi, and ns*^(n){^mi) is the power of 6:;^{r]) 
(at ^rn = (.ml)- In the clustered situation with JH = l+l^^ -Mk, where (A^/c, k = 1, 2, ... , K) 
partitions A^, we may denote by Pk{v) and (k, respectively, the common ROC function 
and size for the decision functions in cluster A4k- Under second-order differentiability of 
Pfc(?7)s, by Theorem 14. 3[ the optimal weak FWER-a controlling multiple size vector ({a) = 
(2(0), . . . , Ci^(tt)) is the ( = (Ci, (2, ■ ■ ■ , Ck) that solves the set of equations 

Wk = l,2,...,K : p'fc(Cfe)(l - Cfc) = A for some A G 3?; 
Ek=i\Mk\\og{l-Ck) = \og{l-a). 

The second approach, which is analogous to what has been done in several papers such as 
[55] , [53] . [SO] , [ni], and [25], is to estimate or approximate the underlying values of the 
^mS either using the observed data x, possibly using shrinkage-type estimators, or through the 
use of prior information, possibly informed by external covariates as in [15] . Addressing this 
same restriction of requiring knowledge of the simple null and simple alternative hypotheses 
and advocating this second approach, Roquain and van de Wiel [31] wrote: "Although 
leading to oracle procedures, it can be used in practice as soon as the null and alternative 
distributions are estimated or guessed reasonably accurately from independent data." By 
'independent data' is meant in [M] as data different from that used in performing the actual 
tests. However, such external data need not always be used for estimating or imputing the 
unknown parameters or distributions. For example, let us suppose that for each m G A^, 
data Xm could be partitioned into {vm,Wm)- We may then use (mivm) = ^Six{^rno, (.mivm)} , 
where $,m{vm) is the maximum hkelihood estimate of based on Vm- We may then proceed 
as in the preceding paragraph where ^mi is set to ^m{vm) for each m G A^, but with the 
component data Wm used in the test functions. The resulting MDF will then be of an 
adaptive type, possibly also compound, similar to those in [53], if shrinkage estimators are 
used for estimating the ^^s using the Vm components. Observe that if for some uiq G A1, 
^moivmo) and ^rrioo are very close or identical, then a relatively small size will be allocated 
to the MP test for component mo- (Refer to subsection 14.61 for a discussion on the behavior 
of the components of the optimal multiple decision size vector.) This then amounts to 
downgrading, or simply ignoring, the testing problem for this component, a fact that is 
of importance since a criticism of multiple hypotheses testing, especially when using the 
FDR error rate, is that an unscrupulous investigator may just keep adding irrelevant genes. 
When using the adaptive MDF arising from the optimal multiple decision size vector, this 
investigator's strategy will be foiled since the adaptive MDF will automatically downgrade 
the irrelevant genes. 

This second approach, however, still requires further study. For instance, there is the 
issue of how to partition each Xm into the Vm and components. Furthermore, the impact 
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of a misspecified ^^i, possibly arising from the estimation procedure, needs to be ascertained. 
For example, suppose that the true ^^i is approximated by ^^i. Then, the optimal weak 
FWER-controlling multiple size vector will be robust to such misspecification provided that 
the mappings, for m = 1,2,..., M, 

r-m K'lmi Smi; 

are approximately invariant in m. Here, pm{fl\ ^mi) is the ROC function associated with the 
MP decision process for testing if^o : im = ^mo versus Hmi : = ^mi, and p'}^{rj;^) = 
{d^^^ / drj'^ d^^} pm{f]\ ^) . However, we are still currently studying these and related issues 
pertaining to the implementation of the procedures in more depth. 

An alternative procedure, but which we do not advocate, is to utilize the full data x 
to estimate the ^mS, and then to utilize again x in the decision functions. However, this 
approach could have unacceptable consequences due to re-using or double-dipping on the 
data X. Such re-use of the data may invalidate desired properties of the MDF, such as 
control of the Type I error rate; see the discussion in [T5] . 



5.3 Connections to j9- Value Statistics 

Corollary 13.21 indicates that the ROC function rj i— )■ Pm{v) is differentiable if and only if 
the distribution function of the, possibly randomized, p- value statistic Sm{Xm, Um) under 
Hmi '■ Qm = Qmi is differentiable. In this case, p^(-) coincides with /?.„(■), the density 
function of Sm{Xm,Um) under Hmi '■ Qm = Qmi- The first condition (14. 6 p in Theorem 14.31 
may then be restated in terms of these density functions via 

hm{Vm){^ — Vm) = Coustaut, Vm G A^. (5.10) 

This is a surprising result as it indicates that it is not enough to simply find the sizes that 
maximize these hm{-)s, as dictated by the Neyman-Pearson Lemma when dealing with a 
single pair of null and alternative hypotheses. Rather, in the multiple hypotheses testing 
problem, there is attenuation in that larger sizes incur some penalties. This is the phe- 
nomenon referred to in subsection 14.61 as a size-investing strategy. Equation (I5.10p governs 
the interactions among the M tests regarding their size allocations to achieve the best overall 
result, in terms of overall Type H error, among themselves. See also the last paragraph in 
section 3 of [31] regarding the form of their optimal mult i- weighted step-up procedure. 

The optimal weak FWER-controlling MDF may be converted to a procedure based on 
the p-value statistics. If ?7*(a) = {ri^{a),m G Ai) is the optimal weak FWER-a multiple 
decision size vector and {Sm{xm,Um),fn ^ -M.) is the vector of computed p-value statistics, 
the decision based on data {x,u) = {{xm,Um),rn G Ai) is S*{x,u) = {I{Sm{xm,Um) < 
?7j^(a)},m G Ai), an MDF based on weighted p-values. This is related to the approach 
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proposed in several papers of using weighted values such as in [IB], [5S], and [5^ . 

In our case the weights are tied-in to the optimal sizes. Observe that it is possible to have 
Smi{xmi,Umi) < (a;m2 ) ) but with -ffmiO not rejected and Hm20 rejected depending on 
the values of r/^^ (a) and r]^^ (a). In such cases, decision-making for the M pairs of hypotheses 
need not be transitive with respect to the p-value statistics; see section 3 in [53] where this 
behavior also occurs with their oracle-based adaptive procedure for control of mFDR. 

6 Strong FWER Control 

In this section we develop a strong FWER-controUing compound MDF using as a starting 
point the optimal weak FWER-controlling procedure developed in Section |H 

Let A* = (A;,, m G A^) be the MP MDP with A;; = i6:^ir]) : e [0, 1]) the MP decision 
process for i^^o '■ Qm = Qmo versus Hmi '■ Qm = Qmi based on (Xm,f/m)- Without loss 
of generality, we assume that the size function Am{-) of /S.*^ satisfies Amijf) = rj. Define 
the mapping r] : [0,1] — )• [0,1]*'^ such that 77(a) = {rim{(y),m G A4) is the optimal weak 
FWER-controlling multiple decision size vector at level a. We shall assume in this section 
that each component of this mapping is nondecreasing and continuous, which is the case for 
instance when the ROC functions of A* are twice-differentiable as established in Proposition 

m 

For an FWER threshold of a G [0, 1], the optimal MDF in Vq is 

6*w{a) = {6*j7]Ua)),meM). (6.1) 

Associated with this MDF is the g'enerafcec? multiple decision p- value statistic W = {Wm, m G 
Ai), where 

Wm = Wm{Xm, Um) = inf{a G [0, 1] : C(r/„(a)) = 1}. (6.2) 

The Wm = Wm{xm,Um) is the smallest weak FWER size leading to rejection of Hmo when 
using S^{a) given data {x,u) = {{xm,Um),m G Ai). The usual p-value statistic Sm [see 
( 13. 3p ] for 6^ is related to Wm via 

M : Sm{Xm, Um) = Vm{Wm{Xm-Um)) ■ (6.3) 

In the sequel, the statistic Q, which takes values in the set of permutations of (1,2,..., M), 
and denoted by 

Q = (Qi, g2, . . . , Qm) = ((1), (2), . . . , (M)), (6.4) 
represents the anti-rank vector of W, so that Wq-^ < Wq^ < . . . < Wqj^j or, equivalently, 

W(i) < W(2) < ... < W(M). 

Now, a la [501 ES]) suppose an Oracle knows Q, the true underlying probability measure 
governing X. For the MDF 5^{a) in ([6JD, its FWER is 

Roi{S*^{a),Q) = 1 - n [1 
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This is a nondecreasing and continuous function of a by virtue of the assumed nondecreas- 
ingness and continuity properties of the mappings a ^ f]m{<^) for each m G A^. Now, if the 
Oracle desires to control this Type I error rate at a value q* G [0, 1] and also minimize the 
MDR given by 

R2{S*w{a),Q) = \M,m- E PmivM) 

where Pm{Vm{ci)) is the power of (5^(?7m(a)), then She should choose the largest a G [0, 1] such 
that Roi{6lY{a),Q) = q*. Owing to the continuity and nondecreasingness of Roi{S^{a),Q) 
in a, the Oracle's optimal a could also be expressed via 

a\q*;Q) = M L E [0,1] : [1 - vUa)]'-'-^'^^ 1 - q*\ , 

L m£M ) 

with the convention that inf 0=1. 

However, there is no Oracle and Q is not known, else there is no multiple decision problem. 
So a^{q*; Q) is not observable! A natural idea is to estimate the unknown 9m{Q), the state 
of the mth pair of hypotheses. An intuitive and simple estimator of 9m{Q) for a fixed value 
of a is 

Om{Q) = 6*^{r]m{a)-) = 6*^{Xm, Um] Vmioi)-). (6.5) 

In turn we obtain a step-down estimator = a^(X, U ; q*) of the Oracle-based a'^iq*', Q) 

given by 

a\q*) = inf i a G [0, 1] : ~ //^(a)]^-^-^'''"^")-) < 1 - g* i . (6.6) 

I meM ) 

This then determines the compound MDF 5*g{q*) = Sg{X, U;q*) G V, where 

6*s{q*) = mvm{c^\ql)),meM). (6.7) 

By virtue of the optimal choice of the rim{(y)s and the use of the MP tests, we expect Sg{q*) 
to possess excellent, if not optimal, MDR-properties. By taking the infimum over the weak 
FWER-size a coupled with the estimation of 6m{Q) by 5*^{rjm{(y)—) in (16.61) . there occurs 
an adaptive downweighting of components whose Hmo^ are most likely correct as dictated 
by the data (x,m). We now establish that 5*g{q*) in (16.71) do control strongly the FWER. 

Theorem 6.1 Let q* G [0, 1]. Then, VQ G Q, Roi{5*s{q*),Q) < q* . 

Proof: Fix a g* G [0, 1] and let Q G Q be the true underlying probability measure of X. 
Define the stochastic process Ti = {Ti(a) : a G [0, 1]} via 

Ti(a)= H [l-r/„(a)]i-^-(^''-(")-). 
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The sample paths of Ti are, a.e. [Q], left-continuous with right-hand limits (caglad), piece- 
wise nonincreasing, and with Ti[a—) = Ti{a) < Ti(a+). In terms of Ti, we have a\q*) — 
mi{a e [0, 1] : Ti{a) < 1 - q*}. Consequently, Ti{a^{q*)) >l-q*. Now, note that 



meMo{Q) 



} ■ (6.8) 



Observe that we could not write the last term as a product since the events concerned are 
not anymore independent owing to the fact that CK^{q*) depends on the whole data {X, U). 
We also have that 

Next, define the stochastic process T2 = {T2{a) : a e [0, 1]} via 

\meMo(Q) J \meMi{Q) 

This process, which depends on the unknown Q, also has caglad sample paths. Let 

a*{q*) = a*{q*; Q) = mi{a G [0, 1] : T2(a; Q) < I - q*}. 

In contrast to a\q*), a'^{q*) is not a random variable since it depends on the unknown 
Q. Note that we also have T2(a*(g*)) > 1 — q*. It is also clear from their definitions that 
Ti{a) > T2{a;Q). As such, since Ti{a) < 1 — q* imphes T2{a;Q) < 1 — q*, then it follows 
that Qi''{q*) > Q;*(q'*). More importantly, observe that 



a\q*) < min W„ 

meMoiQ) 



\ a*(q*) < min Wm > 

{ meMoiQ) J 



To see this, the inclusion C is immediate from the last result. Since {Q;*(g*) < min^g^g(Q) W^} 
implies that for some < min^GAioCQ) then we have T2(ao; Q) < 1 — q*. For such an 
«0! Sm{Vm{o:o)—) = for all m G A^o(Q)- Thus, E {a E [0, 1] : Ti{a) < 1 — q*}. Conse- 
quently, a^q*) = mi{a e [0, 1] : Ti{a) < 1 — g*} < ao < ^^^meMoiQ) ^m- Thus, ^ inclusion 
holds, completing the proof of the set equality. 
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Invoking the iterated expectation rule and the above equivalences, we have 

n KivUc^^qD) = 0] 

meMoiQ) 



Po i " < min W„ 



Ec 



< a*(q*) < min Wm } 

[ meMoiQ) J 

'Q\a*{q*)< min W„^\a*{q*)\ 

{ meMoiQ) ) 



The quantity a'^{q*) is measurable with respect to cT{(5j^ : m G A^i(Q)}, whereas the random 
variable minmeA^o(Q) is measurable with respect to o"{5^ : m G A^o(Q)}- These sub- 
sigma-fields are independent since 5^s are simple decision functions and by virtue of the 
independence condition (I). As a consequence, 

Pq {min„g^„(Q) Wm>w] = Pq jflmeA^oW) i^mi'nmiw)) = 0]| 

= rimeA^oW) = 0} = nmgA4o(Q)[l " 

with the product arising because of the independence of the : m G M.q{Q)} from 
condition (I) and since the (5^s are simple. Therefore, 

Pq] n [UVm{a\q*)))=Q] 

= Eq\ n [^-Vm{a*{q*))\ 

> EQ{T2(a#(g*))} 

> EQ{l-q*) 
= l-q*- 

Using this result and (16.81) . we have RQi{5*g{q*), Q) <\ — {\ — q*) = q*, proving the theorem. 

Let us relate Sg{q*) to the ]?- value and generalized p-value statistics S and W, respectively. 
Define the random variable 

Jt(g*) = maxjjGA^: Ti(W^(,))>l-g*, z = l,2,...,j} 

(M 
jeM: > l-g*, i = l,2,...,j 

m=i 
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Then, a^{q*) G [W^(jt(5*)), Vr(jt(g*)+i)), so we may re-express Sg{q*) via 

Suppose now that we are in the exchangeable setting where Qmo = Qo and Qmi = Qi 
for all m G A^, so that the ROC functions for the M decision processes are identical. The 
optimal weak FWER-controlling multiple decision size vector will have identical components 
and will coincide with the Sidak sizes, that is, for each m G Ai, ?7m(«) = 1 — (1 — aY^^'^ = 
?7^(a). From (16.31) we then have that for each m G A^, 

S(^m) = l-{l-W(^m)Y^'" and = 1 - (1 - 

It follows that 

Jt(g*) = maxlj eM:l[{l-W^,^y/'' >l-q\t = l,2,...,j\ 

(. m=i J 

= max {jeM:{l- W^.^Y^' > 1 - q* ,z = 1,2, . . . , j} 
= max {jeM: < 1 - (1 - g*)^V(M-+i)^ , = 1,2,...,/} 
= max [jeM: 5« < 1 - (1 - z = 1, 2, . . . , j} . 

Therefore, in terms of the usual p- value statistics (5*1, 5*2, ... , Sm), H^o will be rejected if 
and only if the rank of 5*^ is at most This is precisely the FWER-controlling step- 

down Sidak compound MDF; see, for instance. Procedure 3.9 on page 123 in Dudoit and van 
der Laan |9]. We have thus established that the sequential step-down Sidak procedure is a 
special case of the compound MDF 6g{q*), arising under the exchangeable setting. 



7 Strong FDR Control 

In this section we obtain an FDR-controlling compound MDF anchored on the weak FWER- 
controlling simple MDF S^{a) in (16. ip . We consider the same framework as in Section [61 In 
particular we assume that, for each m G A^, the mapping a G [0, 1] i— )■ r]m{a) is nondecreasing 
and continuous, where r]{a) = {rjm{oi),m G At) is the optimal weak FWER-a multiple size 
vector. Our idea in obtaining an FDR-controlling MDF builds on the development of the 
BH MDF, specifically Benjamini and Hochberg's [T] rationale of their Theorem 2. 

Let g* G [0, 1] be the desired FDR threshold and Q be the underlying probability measure 
of X. We introduce two stochastic processes: Tq = {To(a; Q) : a G [0, 1]} and T = {T(a) : 
a G [0, 1]}, where 

ro(a; Q) = J2mGMo{Q) '^m(^m(a)) and T{a) = E^ex ^*miVrn{a)). 
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For the MDF 5^{a), its FDR is 

R,{5*^ia),Q) = Eq [^^^^nna) > 0} 

By definition of the generahzed p- value statistics WmS in fl6.2p we have for a G [W^(m)? W^(m+i)) 
that T{a) = m, whereas 

EQ{Toia; Q)} = ^ (1 - 0^(Q))r/^(«) < J] r/^(a). (7.1) 

Suppose we focus on an a G [W(^rn),W(^m+i))- If J2jeM^j(^(™-)) — "^l* i then the best a in 
this interval will be the largest value satisfying X^jgA^ ^il"^) — "^9*; since by increasing a, 
the MDR decreases as argued in the development of 5*g{q*) in Section [6l This motivates our 
definition of OL*{q*) = a*{X, U]q*) as the step-up estimator 

a*{q*) = sup J a G [0, 1] : ^ r^^a) < q* ^*miVm{a)) i . (7.2) 

This induces a compound MDF Sp{q*) = 6p{X,U;q*) G V given by 

sUq*) = {sUvm{(^*{q*))),m g m). (7.3) 

Another possibility, which may lead to further improvements, is to estimate 6m{Q) in 
dEH) by 6;^{r]mia)-), as in Section El and replace T,meMVm{a) in (ES]) by T^meMl^ ~ 
^miVm{c()—)]rim{(x)- This may extend the adaptive procedure in [2]; however, we defer its 
consideration for future work. We establish that Sp{q*) controls the FDR at q*. 

Theorem 7.1 Let q* G [0, 1]. //, VQ G Q \ {Qo} and Va G (0, 1), 

\Mo{Q)\ max r]m{a) < V] Vm{a), (7.4) 

meMoiQ) 

then i?i(5>(g*),Q) < q* forMQ G Q. 

Proof: The cases g* = and g* = 1 are clearly trivial, so let q* G (0, 1), and let Q G Q be 
the underlying probability measure of X. First, we note that for Va G [0, 1], ?7m(«) < for 
all m G A^. For each a G (0, 1], let us define the sigma-field 

•^a = (y{5*^{X^, Um; ■ mEM,a<(3<l}, 

and let J-q = Va6(o i] Observe that for < a < /3 < 1, J-q D J-'/3. Denote by F = {J^a '■ 
a G [0, 1]} the induced filtration. For conciseness, we shall drop {X^, Um) in S!^{Xm, Um', rjm) 
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and simply write S!^{rim), and also write a* for a*{q*). Observe that the stochastic processes 
Tq and T defined earlier are F-adapted. 

For the MDF 6*p = 6*p{q*) in (Q, its FDR is 



Ri{6*p,Q) = 



Q 



Toia*;Q) 
T{a*) 



I{T{a*) > 0}|7Wc 



where, given Q, Aio = A^o(Q) is a random, albeit degenerate, subset of Ai. Thus, the 
outer expectation is an expectation with respect to this degenerate probability measure. Let 



0, then this expectation is zero since 



us focus on the inner expectation. If Mq 
To (a*) = 0, hence bounded by q*. 

Consider the case where Mq G {1, 2, . . . , M — 1}. From the definition of a* = a*{q*) in 
(17. 2p . we see that it is a F-stopping time. Also, observe that at a = a*, we have the inequality 
T{a*) > EmeAi^m(a*)/g* = r],{a*)/q* with r/.(a) = EmGA^^m(tt)- Consequently, 



Er 



T{a* 



-I{T{a*) > 0}\M^ 



-I{T{a*) > 0}\Mc 



(7.5) 



the last equality following since To(a)J{T(a) > 0} = To(q;) for every a G [0,1]. Next, 
consider the F-adapted process Tq = {Tg*(a) : a G (0, 1]} with 



T*{a)^ma;Q)= Yl 
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We have, for < a < /3 < 1, that 



-— — -J^o 

Vm{a) 



5;,(^™(/3))Pq„o{'5;;(^™(«)) = i} 



meA4o ''"^("^ PQ,.o{^m(^™(/3)) = 1} 

m^Mo 



E 



where the fourth equahty follows from Condition (I) in f l2.3p . The fifth equality follows from 
the assumed nondecreasing property of the mapping a i— )■ ?7m(a), which implies that for 
a < (3, {Sl^{r]m{a)) = 1} ^ {^miVmW)) = 1}, and the size condition P Q^oi^miVm) = 1} = 
EQrnol^miVm)] = Vm ou the MP tests or the decision processes Aj^s. This sequence of equal- 
ities establishes the result that {(TQ*(a), J^q) : a G (0,1]} is a reverse martingale process. 
Define Tq*(0) = liminfQg(o,i] Tq*(q;) = limsup^g^Q ^ Tq*(q;), which, by Doob's martingale con- 
vergence theorem, is well-defined. Then the extended collection {(Tq (a) , J-'a) : a G [0, 1]} is 
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a reverse martingale process. The expectation portion of the upper bound in 07.51) becomes 

To(a* 



Er 



r],{a* 

Er 



■l-Mc 



Eo < —7 \Jvlo 

T],{a*) 



< 



sup 

q6(0,1] 



max^g^o ?7m(a) 



r],{a*) 

EQ{T*{a*)\Mo} 



< j^^EQ{T*{a*)\Mo} 
= ^^Eq{T*{1)\Mo} 



- E 



EQ^,{6*^{VmmMo} 



Mo Jr^^^ nm{l] 



m&Mo 



where the condition fl7.4p was used to get the second inequahty and the Optional Sampling 
Theorem for martingales [6] to get the third equality. Thus, we have established that 



Er 



T{a*) 



I{T{a*)>Q}\MA<q 



for any Aio provided |A^o| = ^ M — 1. Taking expectation with respect to A^o? which as 
pointed out earlier is a degenerate probabihty measure, yields the result that Ri{5*p, Q) < q*. 

Finally, consider Mq = M, that is, all i^mos are correct. Recall the Sidak size vector for 
weak FWER control at a given by 

r^^(a) = r^f (a) = 1 - (1 - a^^^' , m e M. 

The vector ri^{a) = (?7m(«) '■ rri E Ai) clearly satisfies condition (17. 4p . With r^f (a) = 

Mri^{a), let = a^{q*) = a^{X, U; q*) be 



a 



sup ] a G [0, 1] : r^f (a) < g* 



(7.6) 



and its associated MDF given by 5^ = (S^iVmi^)) ^ ''^ E A4). Since in the proof for the case 
with Mq G {0, 1, 2, . . . , M — 1} it is not necessary that the size vector {rj^ia), m G /A) be 
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the weak FWER-controlling optimal size vector, then the proof also holds when we use the 
Sidak size vector. Furthermore, since the Sidak size vector satisfies condition f l7.4p even when 
Mq = M, then for the Sidak sizes, we have for all Q E Q, including Qq, that Ri{S^ , Q) < q*. 
Define, for a e [0, 1], the processes 

^ ' ~ V (a) ^ ' ~ n^(a) 

with V*{Q) = V^{0) = 0. Observe that, under Qq, the expectations of V*{a) and V^{a) are 
both equal to 1 for each a G (0, 1]. We may re-express both a* and via 

a* = sup{a G [0, 1] : V*{a) > 1/q*}; 
= sup{a G [0, 1] : V^{a) > l/q*}. 

Since g* < 1, by Lemma [7. which is stated and established below, it follows that 

PQo{^*(«)>^}<Pqo{^'(«)>^ 

St 

This implies that, under Qq, a* < . From this it follows that 

Ri{S*p,Qq) = PQ,{a* > 0) < PQ„(a^ > 0) = Ri{6',Qo) < q*. 

Though not essential in the proof, notice that, in fact, Ri{6^,Qo) = q*. This is seen by 
noting that under the Sidak size vector rj^ and the associated MDF 6^ , all the inequalities 
in the proof are in fact equalities. We have thus completed the proof that, whatever Q is, 
Ri{5*,Q)<q*. II 

Lemma 7.1 Let {Vm,m G Ai) be independent random variables with Vm ~ Ber{r]m) where 
T] = {rim, m G M) G [0, 1]^. For a > 0, define 

Then, Va G (0, 1) and Va G [1, oo), sup {ha{r]) : f] G UB{Ca)} = ha{ri^{a)). 

Proof: Let Zi, Z2, . . . , Zm be independent random variables with Zm ~ Ber{pm) and denote 
by P = ig Em=i Pm- For t > 0, let 

M 



K{Pl,P2, ...,PM)=P\^Z,n>t\. 



. m=l 
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In Hoeffding [2T] (see also pages 375-376 of Marshall and Olkin [30]) it was proved that if 
Mp < t < M, then h^{pi,p2, ■ ■ ■ ,Pm) < hl{p,p, . . . ,p). In the setting of the lemma, define 
Pm =Pm{ri) = - log{l-r]m) for m G M. Then, r] G UB{Ca) Y.m=i PmiVm) = -log(l-a). 
For a > 1, we are then able to apply the result in [21] to conclude that for all r] G UB{Ca), 

haiv) = /i-^log(l-a)(Pl(^l),---,Pm(?/m)) 

< /^l^ iog(i-a)(-^log(l -«),•••, log(l -«)) =/^a(r/^). 

This proves the lemma. || 

We establish that Sp{q*) reduces to the BH procedure under the exchangeable setting. 

Corollary 7.1 // the ROC functions are identical, then Sp{q*) is the FDR-q* controlling 
MDF in Benjamini and Hochberg fJ^. 

Proof: The condition implies that Vm G Ai,rim{<y) = ??(«) for some //(a). Therefore, (17. 2p 
becomes 



«*(g*) = sup Le[0,l]: Mr^ia) < q* Yl [ • 



(7.7) 



Relabeling 77(a) by just a in (17. 7p and noting that each S^{a) could be re-expressed via 
S^{a) = I{Sm{Xm, Urn) < «}, then (17.71) becomes 

a^(g*) = sup <^ a G [0, 1] : « < |^ I{Sm{Xm. <a}\. 

L m=l J 

With 5(1) < 5(2) < ... < S(^M) denoting the ordered p- value statistics, we may define the 
random variable, as in jT], 

J^q*) = J'{X, U- q*) = max |m G A^* : S^m) < ^} • (7-8) 

It is then easy to see that a^{q*) G [S'(js(q.)), S'(js(g*)_|_i)). Now, when using the MDF 6p{q*), 
Hmo is rejected if and only if Sm < cn^iq*), but from the above relation, this occurs if and 
only if the rank of Sm is no more than J^{q*). This latter procedure is precisely the BH 
FDR-g* controlling MDF in [Ij. || 

Notice in the preceding proof that the BH MDF, denoted 6^^{q*), coincides with the 
Sidak-size based MDF S^{q*). The martingale proof for Theorem 17.11 thus carries over to 
establishing strong FDR control by 6^^{q*). A martingale-based proof of FDR control by 
6^^{q*) was also in [S2]- Furthermore, we point out that the MDF Sp{q*) fulfills, in an 
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exact manner, the FDR constraint. In contrast, the estimated ODP MDF in [SU] only 
approximately satisfies this constraint. 

Some remarks are in order regarding condition fl7.4p . As noted above, the Sidak multiple 
decision size vector, which could be viewed as the optimal multiple decision size vector 
when the ROC functions are identical, always satisfies (17.41) . In general, when not in the 
exchangeable setting, condition (17. 4p induces a form of control of the differences of these 
ROC functions. We conjecture that a weaker condition is possible to still achieve FDR- 
control by the MDF 5*p. But a non-martingale-based proof may be needed to resolve this. 
We also provide an alternative form of 5*p{q*) in terms of the generalized p- value statistics 
VF^s. Define 



J*(g*) = J*(X, U- q*) = max ImeM: ^ rij{W(^r. 



,)) <q*m}. (7.9) 



Then Sp{q*) rejects -f/'(m)o for m G {1,2, ... , J*{q*)} and accepts -f/'(m)o for m G {J*{q*) + 
l,J*(g*)-t-2,...,M}. 

Let us also examine further the WmS. Focussing on ^^(i), under Qq, we have, for a G (0, 1), 
Pqo(W^(i) > a) = Pq, {OmeM [^*miVm{a)) = 0]} = UmeM [1 - VM] = 1 - a, using the 
independence of the 6^s under Qq. Thus, W(i) is standard uniform when all null hypotheses 
are correct. Using this uniformity result and the following lemma about lower and upper 
bounds of r], for r] G UB{Ca), we obtain in Proposition 17. II a lower bound for Ri{6p{q*), Qq), 
which is the FDR when all the null hypotheses are correct. 

Lemma 7.2 Every rj G UB(Ca) satisfies 

a<v.= Yl ^m<min{-log(l-a),M[l-(l-a)i/^^]}. 

Proof: For rj G UB{Ca), let Vi, V2, . . . , Vm be independent Bernoulli random variables with 
Vm ~ Berijim)- Bonferroni's inequality yields 



a 



= 1- n [i-v.n] = p\ U [v;. = i]i 

< J2 P{Vm=l}= Yl Vrn = V., 



m£A4 mGA4 

establishing the left-hand inequality. 

Since for every a G [0,1),— log(l— a) > a, then from the constraint condition J2m£M 
rjm) = log(l — a), we obtain rj, < — log(l — a). But since a log(l — a) is concave in [0, 1), 
then 

iog{(i - = E - ^'") ^ ( 1 - 7^ E 
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This implies that (1 — a)^/*^ < 1 — f],/M which is equivalent to 77, < M[l — (1 — aY^^'^]. 
Proposition 7.1 Vg* G [0, 1], 1 - (1 - q*/Mf^ < Ri{5*p{q*), Qo) < q* . 
Proof: It remains to show the left-hand inequality. We have that 



Using the first component of the upper bound for rj, in Lemma \T72[ we get Pqo{'7»(^(i)) — 
q*} > Pqo{~" log(l ~ ^ 1*} = 1 ~ exp(— g*) since — log(l — W^(i)) is unit exponential 

owing to the standard uniformity of W(^i) under Qo- Using the second component in the 
upper bound, we also get 



Thus, Ri{5*p{q*), Qo) > max{l - exp(-g*), 1 - (1 - q*/M) }. But, as a referee had pointed 
out, or which we see by noting that v G [0, 00) t-). L(t>) = Mlog(l — v/M) + v has L(0) = 
and L'{v) < for w > 0, we have for q* G [0, 00), 1 - exp(-g*) <!-(!- q*/M)^. Thus, 
this completes the proof of the proposition. || 

8 A Modest Simulation 

In this section we present results of a modest simulation comparing the performance of the 
5*p and in terms of FDR and MDR. The results here are limited to demonstrating 
numerically, in a specific Gaussian model, that 5*p achieves the desired FDR-control, as does 
5^^, and that 5*p achieves a lower MDR relative to 5^^ . 

The simulation model is similar to the Gaussian example illustrating the optimal weak 
FWER-controlling procedure. In this model, for each m G A^, the observables are -^rn 
N{fim, 1) which are independently generated. The mth pair of hypotheses is Hmo '■ yUm < 
versus Hmi '■ fim > with UMP size-?]™, test of form 6^{Xm', rjm) = I{^m > — ^m)}- 

The true values of the means /x^s are fim = ^m^m, m E J^, with 9m ~ Ber{p) and effect sizes 
^rn ~ 1)1 5 which are again independently generated from each other. In the simulation, 
the parameter combinations were induced by taking the number of pairs of hypotheses M G 
{20, 50, 100}, the proportion of true alternative hypotheses p G {.1, .2, .4}, and the mean of 
the effect size-generating normal distribution v G {1,2,4}. In implementing 5*p and 5^^, 
we used an FDR-threshold of q* G {.05, .10}. Since the computational implementation of 



i?i(5>(g*),Qo) = PQo(«*(g*)>o) 



p 




Pqo{^.(W^(i)) < Q*} > Pqo{M[1 - (1 - W^d))'/"'] < q*} 
= Pg„ < 1 - (1 - q*/M f] = 1 - (1 - q*/M)- 



M 
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Sp takes time, for each combination of {q*, M, v^p), we only replicated the basic experiment 
1000 times. For each simulation parameter combination, the simulated FDR and MDR* 
were the averages of the observed FDR and the standardized MDR* = MDR/|A^i(Q)| 
over the 1000 replications. For summarization purposes, we used this standardized MDR 
since, for each replicate, a Q is generated, hence |A^i((5)| may differ over the simulation 
replications. Thus, in essence, note that we are comparing the averages of R2{5*p, Q) /\M.i{Q) \ 
and R2{S^^ , Q)/\M.i{Q)\, where the averaging is with respect to the mechanism generating 
the Qs over the simulation replications. 

We only report the results for q* = 0.10 in Table [2] since the results for q* = 0.05 lead 
to similar conclusions. From this table we observe that both 6p and fulfill the FDR- 
constraint, and in fact this happens in a conservative fashion, which is as expected from 
theory. More importantly, the MDR-performance of 6p is better compared to that of 6^^ , 
and this dominance holds true for all the twenty-seven simulation parameter combinations 
considered. Observe that as M is increased with (i^, p) remaining the same, there is an 
increase in their MDR*s; whereas, when u is increased, which has the effect of increasing 
the effect sizes, their MDR*s decrease. Interestingly, the impact of a change of value in p, 
the proportion of true alternative hypotheses, did not necessarily translate into a monotone 
change in their MDR*s, especially when M = 20, though for the larger M- values, the change 
in MDR* appears monotonically decreasing. 

It may appear that the standardized improvement of 6p over 6^^ is minuscule based on 
the results of this simulation study. However, it should be noted that when translated to 
the overall number of discoveries, when M is large, 6p will lead to many more discoveries 
than 6^^ while still maintaining the desired FDR control. Such an increase in the number 
of discoveries may have important practical implications, such as enlarging the number of 
genes that will be explored in consequent studies, hence increasing the chances of finding 
crucial and important genes, but without sacrificing the Type I error rate. 

9 Concluding Remarks 

This paper provides some resolution on the role of the individual powers, or more appro- 
priately the ROC functions of decision processes, in multiple hypotheses testing problems. 
The importance and relevance of these problems is evident as witnessed by the explosion 
in the number of research papers that were published, and certainly those that were not 
published, on this subject in the last few years. A primary impetus for this development is 
the urgent need to deal with the proliferation of high-dimensional "large M, small n" data 
sets in the natural, medical, physical, economic, and social sciences, which are being created 
or generated due to advances in high-throughput technology, the latter fueled by speedy de- 
velopments in computer technology and miniaturization. This is embodied and spearheaded 
by, but not limited to, microarray technology. 
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Table 2: Comparison of the false discovery rate (FDR) and standardized missed discovery 
rate (MDR*) performance of MHTDF S* and 6^^ under a variety of simulation parame- 
ters. This table is for q* — .10. The FDR and MDR* are in percentages. The number of 
replications is 1000. 
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Almost a century ago, Neyman and Pearson demonstrated the need to take into account 
the power function, and the alternative hypothesis configuration, when one is seeking an 
optimal test procedure in the one-pair hypothesis testing problem. Their work led to a 
divorce from the then-existing significance or p-value approach. Currently, many multiple 
hypotheses testing procedures, epitomized by the Sidak procedures for weak and strong 
control of the FWER, and by the well-known Benjamini-Hochberg (BH) procedure for control 
of the FDR, are based on the p-values of the individual tests and do not seem to consider 
possible differences in the powers of the individual tests. They are appropriate in the so- 
called exchangeable setting wherein powers of the individual tests are identical. 

In this paper we examined the question of whether differences in power characteristics 
of the individual tests could be exploited to improve on existing procedures for FWER 
and FDR control. This was done in a general decision-theoretic framework to allow for 
results that are applicable even with complicated data types and structures, and in the 
most fundamental setting where each pair of hypotheses consists of a simple null and a 
simple alternative hypothesis. First, an optimal MDF within the class of simple MDFs 
was shown to exist for weak FWER control. This MDF exploits differences in the power 
characteristics of the individual tests. In particular, this MDF is better than the Sidak weak 
FWER-controUing MDF, though the latter is a special case of the optimal MDF arising 
under the exchangeable setting. The resulting theory also informs us regarding an optimal 
size-investing strategy. Second, by using this optimal, though still restricted, MDF as an 
anchor, we developed a compound MDF which strongly controls the FWER. The sequential 
Sidak MDF is a special case of this MDF, arising under the exchangeable setting. We then 
developed a compound MDF that (strongly) controls the FDR. The BH FDR-controlling 
MDF is a special case, arising under exchangeability. These new MDFs, by virtue of their 
construction, are expected to have smaller MDRs compared to those which do not exploit 
power differences. This was demonstrated through a modest simulation study for the new 
FDR-controlling MDF. The MDFs were also related and contrasted with other compound 
MDFs, notably the ODP in [50] , and those using weighted p- values. 

Though the proposed MDFs do improve on existing ones developed under the exchange- 
able setting, we could not claim that they are optimal among all compound MDFs for control 
of FWER or FDR. This question of global optimality appears to be a difficult and elusive 
problem. So far none of the existing compound MDFs, such as the estimated ODP in [50], 
could claim global optimality. In our case, the possible drawback is the fact that in the 
construction of these new MDFs, the starting point is the class of simple MDFs. Indeed, 
the resulting MDFs are compound, but the issue of establishing global optimality is not 
transparent. In fact, a question even arise as to whether there actually exists an optimal 
MDF among all compound MDFs that, say, control the FDR. One thing certain about our 
proposed MDFs is that they do satisfy the desired FWER or FDR constraints. Other MDFs, 
obtained by plugging-in estimators hence are adaptive, or which utilize prior information, 
may lose their optimality property or may not anymore satisfy desired Type I error con- 
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straints after the plug-in step. See [53] where optimahty was in an asymptotic sense and 
with the Type I error rate being the mFDR, as well as [TB] and 01] for more discussions on 
these issues. 

A natural layer to add in the decision-theoretic formulation of the problem is a Bayesian 
layer where a prior measure is specified on the unknown probability measure Q or, alterna- 
tively, on 9{Q). There is a possibility that through this Bayesian approach, one may be able 
to obtain a characterization of the class of optimal MDFs controlling Type I error rates, or 
when the two types of error rates are combined, for example, via a weighted linear combi- 
nation. The papers [HI [391 [I2l [13] which employ Bayes or empirical Bayes approaches are 
certainly highly relevant. 

Finally, we mention that there are still other aspects of the multiple decision problem not 
dealt with in this paper. The first one is the extension to situations with composite null and 
alternative hypotheses, or how to adapt the new MDFs to such settings. We indicated some 
ideas in subsection 15.21 for distributional models possessing the MLR property, but clearly 
more extensive studies are needed. The second one is that of possible dependencies among 
the components of (X^, m G M.o{Q)). In the setting we considered, it was assumed that this 
is an independent collection according to condition (I), but it would be of interest to obtain 
results under certain types of dependencies. Potential results in such scenarios will extend 
those in [371 EH] and [3] . In both of these settings, resampling-based ideas and approaches, 
such as the use of permutational distributions, which were developed and implemented, for 
instance, in [57] and [56] will be highly relevant. 
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